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Preface 



The papers in this volume were presented at SWAT 2000, the Seventh Scandina- 
vian Workshop on Algorithm Theory. The workshop, which is really a conference, 
has been held biennially since 1988, rotating between the five Nordic countries 
(Sweden, Norway, Finland, Denmark, and Iceland). It also has a loose associa- 
tion with the WADS (Workshop on Algorithms and Data Structures) conference 
that is held in odd numbered years. SWAT is intended as a forum for researchers 
in the area of design and analysis of algorithms. The SWAT conferences are coor- 
dinated by the SWAT steering committee, which consists of B. Aspvall (Bergen), 
S. Carlsson (Lulea), H. Hafsteinsson (U. Iceland), R. Karlsson (Lund), A. Lingas 
(Lund), E. Schmidt (Aarhus), and E. Ukkonen (Helsinki). 

The call for papers sought contributions in all areas of algorithms and data 
structures, including computational geometry, parallel and distributed comput- 
ing, graph theory, and computational biology. A total of 105 papers were sub- 
mitted, out of which the program committee selected 43 for presentation. In 
addition, invited lectures were presented by Uriel Feige (Weizmann), Mikkel 
Thorup (AT&T Labs-Research), and Esko Ukkonen (Helsinki). 

SWAT 2000 was held in Bergen, July 5-7, 2000, and was locally organized by 
a committee consisting of Pinar Heggernes, Fetter Kristiansen, Fredrik Manne, 
and Jan Arne Telle (chair), all from the department of informatics. University 
of Bergen. 

We wish to thank all the referees who aided in evaluating the papers. We 
also thank The Research Council of Norway (NFR) and the City of Bergen for 
financial support. 
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Abstract. First we review amortized fully-dynamic polylogarithmic al- 
gorithms for connectivity, minimum spanning trees (MST), 2-edge- and 
biconnectivity. Second we discuss how they yield improved static algo- 
rithms: connectivity for constructing a tree from homeomorphic subtrees, 

2-edge connectivity for finding unique matchings in graphs, and MST for 
packing spanning trees in graphs. 

The application of MST for spanning tree packing is new and when 
boot-strapped, it yields a fully-dynamic polylogarithmic algorithm for 
approximating general edge connectivity within a factor y^2 -|- o(l). 

Finally, on the more practical side, we will discuss how output sensitive 
algorithms for dynamic shortest paths have been applied successfully to 
speed up local search algorithms for improving routing on the internet, 
roughly doubling the capacity. 

1 Dynamic Graph Algorithms 

In this talk, we will discuss some simple dynamic graph algorithms and their 
applications within static graph problems. As a new result, we will derive a 
fully dynamic polylogarithmic algorithm approximating the edge connectivity A 
within a factor y^2 -|- o(l), that is, the algorithm will output a value between 
A/ -\/2 + o(l) and A x y/2 + o(l). 

The talk is not intended as a general survey of dynamic graph algorithms and 
their applications. Rather its goal is just to present a few nice illustrations of the 
potent relationship between dynamic graph algorithms and their applications in 
static graph problems, showing contexts in which dynamic graph algorithms play 
a role similar to that played by priority queues for greedy algorithms. 

In a fully dynamic graph problem, we are considering a graph G over a fixed 
vertex set R, |R| = n. The graph G may be updated by insertions and deletions of 
edges. Unless otherwise stated, we assume that we start with an empty edge set. 
We will review the fully dynamic graph algorithms of Holm et al. HU for connec- 
tivity, minimum spanning trees (MST), 2-edge, and biconnectivity in undirected 
graphs. For the connectivity type problems, the updates may be interspersed by 
queries on (2-edge-/bi-) connectivity of the graph or between specified vertices. 
For MST, the fully dynamic algorithm should update the MST in connection 
with each update to the graph: an inserted edge might have to go into the MST, 
and if an MST edge is deleted, we should replace with the lightest edge possible. 
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Both updates and queries are presented on-line, meaning that we have to 
respond to an update or query without knowing anything about the future. 

The time bounds for these operations are polylogarithmic but amortized 
meaning that we only bound the average operation time over any sequence of 
operations, starting with no edges. In our later applications for static graph 
problems, we only care about the total amount of time spent over all dynamic 
graph operations, and hence the amortized time bounds suffice. 

The above mentioned results are all for undirected graphs. For directed 
graphs there are very few results. In a recent break-through. King m showed 
how to maintain the full transitive closure of a graph in O(n^) amortized time 
per update. Further, she showed how to maintain all pairs shortest paths in 
0{n^'^ \/log C) time per update if C is the maximum weight in the graph. How- 
ever, if one is is just interested in maintaining whether t can be reached from s 
for two fixed vertices s and t, nobody knows how to do this in o(m) time. 

On the more practical side, Ramalingan and Reps 1211 have suggested a 
lazy implementation of Dijkstra’s 0 single source shortest paths algorithm for a 
dynamic directed graph. If X is the number of vertices that change distance from 
the source s in connection with an arc insertion or deletion, they can update a 
shortest path tree from s in degreeiy)) time. Although this does not in 

general improve over the 0{m) time it takes to compute a single source shortest 
path tree from scratch, there has been experimental evidence suggesting that 
this kind of laziness is worthwhile in connection with internet like topologies |7| . 

2 Applications 

We are now going to review some simple applications of dynamic graph algo- 
rithms for solving problems on static graphs. 



2.1 Dynamic Connectivity and the Construction of Trees 
from Homeomorphic Subtrees 

Our first application is of Henzinger, King, and Warnow m- We are given a set A 
of leaves and a set X of triples ((o, b),c) G A^. The problem is then, if possible, to 
find a so-called concensus tree T with leaf set A such that for every ((a, b),c) G X, 
the least common ancestor of a and 6 is a strict descendant of the least common 
ancestor of b and c. The concensus tree problem was raised in 1981 by Aho, 
Sagiv, Szymanski, and Ullman in the context of optimizing relational expressions 
P], and they presented an 0(|A||A|) time solution. Since then the consensus 
problem has found applications in computational biology |10|1'ij . Henzinger, 
King, and Warnow m have reduced the consensus tree problem to decrement al 
connectivity, thereby getting an 0(|A|) bound. 

An example of an application of fully dynamic connectivity, due to Karger 
m, is for identifying highly connected subgraphs in a randomized max-flow 
algorithm. This application is, however, too complicated for the current talk. 
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2.2 Dynamic 2-Edge Connectivity and Matchings 

The dynamic algorithms for 2-edge connectivity have proved useful in making 
efficient constructions in relation to some classical theorems in matching theory. 

A theorem of Petersen from 1891 [221 states that every bridgeless 3-regular 
graph has a perfect matching. Biedl, Bose, Demaine, and Lubiw Pj have used 
dynamic 2-edge connectivity to construct such a perfect matching in 0(n) time, 
improving over the bound the obtained using the general time bound 

for matching when m = 0{n). 

A theorem of of Kotzig from 1959 states that any unique perfect matching 
contains a bridge. Using dynamic 2-edge connectivity for maintaining bridges, 
Gabow, Kaplan, and Tarjan ^ used Kotzig’s theorem to check if a graph has a 
unique matching, improving the running time from 0(mn) to 0{m). 

In this talk we will only review the simple and elegant construction of Gabow, 
Kaplan, and Tarjan. 



2.3 Dynamic MST, Tree Packing, and Edge Connectivity 

The dynamic MST algorithm can be used directly to speed up (1— e)-approximate 
tree packing based on the Lagrangian relaxation techniques suggested by Plotkin, 
Smoys, and Tardos [ 23 ], with the refinements of Young | 2 ^. Using a theorem of 
Nash-Williams m this leads to a \/2 -|- e-approximation of the edge connectiv- 
ity of a graph. What makes all this really interesting is that the construction 
can be made fully dynamic, implying that we can maintain the edge connectiv- 
ity of a graph within a factor ^/2 + o(l) in polylogarithmic amortized time per 
operation. 

The above construction is new and the details are presented in Section 01 



2.4 Dynamic Shortest Paths and Local Search 
for Routing on the Internet 

Our last application is of practical nature, illustrating how dynamic graph al- 
gorithms can be used to speed up local search [J- The general strategy in lo- 
cal search is to iteratively improve some feasible solution by considering small 
changes to it. In the context of graph, this small change may the the insertion 
or deletion of an edge, or just a weight change. We can then speed up the local 
search if we can find an efficient solution to the fully-dynamic graph problem of 
maintaining the objective function under edge updates and weight changes. 

This general strategy was recently used in a local search for improving the 
capacity of the internet by Fortz et al. jOj. Gurrently, Open Shortest Path First 
(OSPF) PI is the most widely used intra-domain internet routing protocol. 
Packets are routed along shortest paths to their destination. The weights of 
the links, and thereby the shortest path routes, can be changed by the network 
operator. The weights could be set proportional to their physical distances, but 
often the main goal is to avoid congestion, i.e. overloading of links, and the 
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standard heuristic recommended by Cisco is to make the weight of a link inversely 
proportional to its capacity. 

A natural question is if one can improve the weight setting given some esti- 
mate of the demands. In this was tested both for a proposed AT&T WorldNet 
backbone, and for various synthetic networks, and weight setting were found 
allowing for a 50-110% increase in the demands over what is achieved with 
standard weight setting heuristics. 

The approach in is a local search algorithm where one repeatedly tries to 
make one or a few weight changes, and see if this improves the routing. In order to 
simulate and evaluate the routing relative to these changes, we need to recompute 
all pairs shortest paths. The networks considered were all sparse (m < 4n) so 
the 0{Ti^'^y/log C) time algorithm of King fE] would be worse than recomputing 
from scratch in 0{n?) time. Instead we used the lazy approach of Ramalingan 
and Reps 121. Even though the networks considered were comparatively small 
(n < 100), this lead to a speed up from around 20 hours and down to about 1.5 
hours, thus making the programs much more attractive to the business units at 
WorldNet. 

3 Tree Packings and Edge Connectivity 

We will now present the new results on tree packing and fully dynamic edge 
connectivity. The two concepts are strongly tied, as detailed below. 

The edge connectivity of a graph is the minimal number of edges whose 
removal disconnects graph. We will denote this number by A(G), or just A when 
G is understood. Note that An < 2m since the minimal degree is an upper bound 
on A. 

A tree packing in G is an assignments of weights to spanning trees of G so 
that each edge e has gets load 

£{e) = < 1. 

T:esr 

The value of the tree packing is denote the value of the 

maximal tree packing of G. 

Theorem 1 (Nash-Williams [:21j). A/2 < r < A. 

Above, r < A follows directly from the fact that any cut is crossed by all spanning 
trees. 

We are going to use fully dynamic MST to speed up a (1 — £)-approximate 
tree packing based on Lagrangian relaxation I23ES), improving the running time 
from 0{Xm) to 0{m). For comparison, the best exact tree packing of Gabow Pj 
takes 0((An)^) time. 

Next, we will argue that this (1 — e)-approximate tree packing can be main- 
tained dynamically, using polylogarithmic amortized time per operation if \je 
and the edge connectivity is polylogarithmic. Rounding and multiplying our es- 
timated packing value by -\/2, we will approximate the edge connectivity within 
a factor \J2. 
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Applying the above construction to a logarithmic number of random sub- 
graphs G{p), p = 1,1/2...., with high probability, we end up maintaining edge 
connectivity within a factor o(l) in polylogarithmic amortized time per 

operation. 

For fully-dynamic connectivity, the best exact algorithm spends 0{Xn) time 
per update, combining the 0(m) randomized static edge connectivity algorithm 
of Karger m with the sparse certificates of Nagamochi and Ibaraki nn, and 
the dynamic sparsification technique of Eppstein et al. 0. 

A previous dynamic randomized fully dynamic edge connectivity algorithm 
has been suggested by Karger m For any a > 0, it produces a y^l -h 2/a- 
approximation with 0{rG) amortized update time if combined with the fully 
dynamic polylogarithmic MST technique from El. For example, with a = 1/2, 
it uses 0{y/n) amortized update time to get an approximation factor of -\/5, as 
opposed to our factor with polylogarithmic amortized update time. 

3.1 Lagrangian Tree Packing 

Young has verified [personal communication, 1999] that his variant El of the 
Lagrangian packing technique of Shmoys, Plotkin, and Tardos m implies the 
following result when specialized to tree packing: 

Theorem 2 (Shmoys et al. |^3j, Young j25j l. The following algorithm pro- 
duces a tree packing of value W where (1 — e)t <W<t. 

1. Initially no spanning tree has any weight. 

2. Set W ■.= {). 

3. While no edge has load 1 

(a) Pick a load-minimal spanning tree T. 

(b) w{T) := w{T) -\- e‘^/{3lnm). 

(c) W :=W -\- e^/(31nm). 

4- Return W. 

Since each iteration increases the value of the tree packing by e^/(31nm), the 
above algorithm must terminate in T31n77i/e^ < A31nm/£^ iterations. Using a 
standard static MST algorithm, this takes 0(Amlnm/£^) time. 

Using the dynamic MST algorithm from El, we get 

Theorem 3. A {1 — e)- approximate tree packing can he constructed in 0{mje^) 
time. 

Proof. We use the dynamic MST algorithm from El to maintain the load min- 
imal spanning tree in the algorithm from Theorem |2| In each iteration, we first 
make a copy T of the current spanning tree, and then, for each edge e G T, we 
increase the load by £^/31nm. Since no load gets beyond 1, the total number of 
load increases is 0(m log m/e^), and each load increase is supported in log*^^^^ n 
amortized time. 

By Theorem^ if we multiply the found tree packing value by \/2, we immediately 
get a deterministic 0{jn) time \/2 -\- o(l)-approximation of edge connectivity, 
matching a result of Matula HH|. 
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3.2 Fully-Dynamic Packing with Small Cuts 

We are now going to present a fully dynamic version of the algorithm from 
the previous section, assuming that we are only interested in edge connectivity 
< Amax- Also, assume, for the moment, that the graph remains connected. 

We will pack q — 3Amaxlnm/e^ spanning trees Ti, ...,Tg, each with weight 1. 
We will not have any limits on how much load can be put on the edges. Using the 
MST data structure from El, each spanning tree Ti is a load minimal spanning 
tree where the load of edge e is 



£i_i(e) = \{Tj :j<i,e£ Tj}\ 



For technical reasons, we assign unique priorities to the edges so that edges of 
higher priority are preferred. Then the MST is unique. The priority of an edge 
is the same for all the different values of i. 

The maximal load in our packing is 

L = max |{Ti : e G Ti}\ 

e£E 

Scaling down the weights by L to give maximal load 1, we get a tree packing of 
value q/L. 

li q/L > Ainax, we conclude that the graph has edge connectivity > Amax- 
Otherwise, by TheoremQ, (1 — £')t/ > q/L < r, where e'^/(31nm) = 1/L < 
>^max/q = e^/(31nm), so s' < e. 

For the analysis of the update time of the above algorithms, we need the 
following lemma: 

Lemma 1. Each insertion or deletion of an edge changes at most i edge loads 
li for each i. 



Proof. By symmetry, it suffices to consider the deletion of an edge e. 

Since we assumed the graph remains connected, the total load remains i(n — 
1). When e is deleted, its load drops to 0, so the load increase over all edges 
/ e is < i. Consequently, it suffices to prove that loads on edges / yf e 

do not decrease. By induction over i, we may assume £/lf{f) > 

Now, if / ^ \ ir'^if) - ifffif) > tf’^i.f) - ^°-i(/), so fr“'(/) > 

if'^if), as desired. Suppose instead that / G T°^'^ \ Tf™ . Since the edges 
have unique priorities and no £i_i load has decreased, / G T°^'^ \ T”®*" im- 
plies e/lfif) > ifJiif). Hence + 1 > This 

completes the proof of the lemma. 



Theorem 4. For graphs with edge connectivity < Amax, we can maintain a 
(1 — e)- approximate tree packing in 0(AmaxPolylog n/e^) amortized time per 
update. The algorithm announces that the edge connectivity is > Amax before it 
gives a worse approximation. 



Dynamic Graph Algorithms with Applications 



7 



Proof. The time spent on maintaining the MST data structure from El for 
each Ti is amortized over load changes. Each load change takes poly logarithmic 
amortized time. Hence the result follows from Lemma ^ stating that the total 
number of load changes per edge update is bounded by J2i<q i < (f- 

If we remove the condition that the graph remains connected, each Ti should 
instead be a load minimal spanning forest, that is, a load minimal forest with 
a tree spanning each connected component. The algorithm from m maintains 
such minimal spanning forests. If the graph is disconnected, instead of returning 
q/L, we just return 0. Concerning the analysis, if a bridge e from some component 
is deleted/inserted, this simply has the effect of deleting/inserting e in each Tj, 
thus giving q dynamic MST operations. To analyze the deletion or insertion 
of a non-bridge, we simply apply the analysis from Lemma Q to the affected 
component, thus getting the same J2i<q'^ 9^ bound as above. 

Corollary 1. For graphs with polylog arithmic edge eonneetivity, we ean main- 
tain the edge eonneetivity within a faetor \/2 in polylogarithmic time per update, 
and further, announce if the graph is too connected for the approximation guar- 
antee. 

Proof. Since the edge connectivity is an integer, we just set e = l/(6Amax)j where 
Amax is polylogarithmic, and round the value of the tree packing to the nearest 
multiple of 1/2 to get a value W in [A/2, A]. Afterwards we return \/2W . 

3.3 Larger Edge Connectivity 

The approach from the previous section gives polylogarithmic time bounds for 
polylogarithmic edge connectivity. In order to approximate larger edge connec- 
tivity, we consider random subgraphs of the graph in question. Let G{p) denote 
the random subgraph of G including each edge of G independently with proba- 
bility p. 

Lemma 2 (Karger CH). Let G be a graph with edge connectivity A and let 
pX > 61nn/£^. Then the probability that the value of any cut in G{p) differs by 
a factor (1 -|- e) from its expected value is 0{l/n). 

It is now easy to provide a -\- o{l) approximation dynamic algorithm for edge 
connectivity: 

— For p = 1, 1/2, 1/4, 1/8..., let Hp = G{p). That is, whenever and edge is 
inserted in G, it is inserted in Hp with probability p. 

— For each Hp, maintain edge connectivity < log^ n as described in Corollary 

□ 

— After each edge update, let Pmax be the largest value of p for which the 
algorithm from Corollary Edoes not report that the edge connectivity is too 
large. 

— Our approximate edge connectivity is the one approximated for Hp,^^^ di- 
vided by Pm ax- 
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Theorem 5. There is a randomized fully dynamic algorithm supporting updates 
in amortized polylogarithmic time that with high prohahility approximates edge 
connectivity within a factor -\/2^Fo(T) . 

Proof. Since we only maintain a logarithmic number of graphs Hp, the time 
bound is immediate from Corollary Q 

If A = O(logn), we will have Pmax = 1, and hence we get a factor -\/2 
directly from Corollary ^ However, for A = o;(logn), the result is immediate 
from Lemma 121 
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Abstract. We review several approaches of coping with NP-hardness, 
and see how they apply (if at all) to the problem of computing the 
bandwidth of a graph. 



1 Introduction 

An important goal of the theory of algorithms is to produce efficient algorithms 
that solve computationally difficult problems. When considering the class of 
NP-hard combinatorial optimization problems, this goal is beyond our reach. 
As these problems need to be solved routinely, we seek ways of coping with 
NP-completeness. Perhaps the most common approaches are the following: 

— Easy special cases. Do not solve the problem in its full generality. Identify 
properties of the input instances that make the problem easier, and design 
an algorithm that makes use of these properties. 

— Somewhat efficient algorithms. Design an algorithm that always solves 
the problem whose running time is not polynomial, but still much faster 
than exhaustive search. This approach may be useful for inputs of moderate 
size. 

— Approximation algorithms. Sacrifice the quality of the solution so as 
to obtain more efficient algorithms. Instead of finding the optimal solution, 
settle for a near optimal solution. Hopefully, this makes the problem easier. 

— Heuristics. Design algorithms that work well on many instances, though 
not on all instances. This is perhaps the approach most commonly used in 
practice. 

We review the four approaches mentioned above. For concreteness, we do so 
in the context of one particular NP-hard combinatorial optimization problem - 
that of computing a linear arrangement with minimum bandwidth for a graph. 
Given an n-vertex graph, a linear arrangement is a numbering of the vertices 
from 1 to n (which can be viewed as a layout of the graph vertices on a line) and 
its bandwidth is the maximum difference in numbers given to the endpoints of an 
edge (the maximum stretch of an edge on the line). The minimum bandwidth 
problem asks for a linear arrangement of minimum bandwidth. This problem is 

NP-hard |2S|- 
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2 Easy Special Cases 



The bandwidth problem is a graph problem. By sufficiently restricting the class 
of graphs considered, one can identify classes of graphs for which the bandwidth 
can be computed in polynomial time (e.g., interval graphs [2S|)- Somewhat sur- 
prisingly, unlike many NP-complete problems, the bandwidth remains NP-hard 
on trees 1 1 7l2iSj . 

To see what kind of restriction on the input graph may be relevant to practi- 
cal applications, let us consider a typical scenario in which the graph bandwidth 
problem arises. This scenario is that of minimizing the bandwidth of a matrix. 
For a symmetric matrix M, we say that its bandwidth is b if all its nonzero 
entries lie on entries at most b locations away from the diagonal. It is relatively 
easy to store and manipulate matrices of low bandwidth (e.g., diagonal matrices 
or tridiagonal matrices). Sometimes a symmetric matrix has large bandwidth, 
but can be transformed into a low bandwidth matrix just by renaming - ap- 
plying a permutation on its rows, and the same permutation on its columns (to 
preserve symmetry). Finding a transformation that minimizes the bandwidth is 
equivalent to finding a linear arrangement of minimum bandwidth for a graph 
whose adjacency matrix is derived from M by replacing every nonzero entry 
by 1. 

Minimizing the bandwidth of a matrix is worth the trouble only if the result- 
ing matrix has small bandwidth. Hence a class of graphs for which the bandwidth 
problem is especially interesting is that of graphs with small bandwidth. Hence 
we may wish to study the complexity of the bandwidth problem as a function 
of 6, the minimum bandwidth of the input graph. 

For the case b = 1, the graph has to be a collection of paths, and finding a 
linear arrangement of minimum bandwidth is trivial. For the case 6=2, Garey, 
Graham, Johnson and Knuth design a linear time algorithm for finding a 
linear arrangement with this bandwidth. They ask whether the case 6 = 3 is NP- 
hard or polynomially solvable. This was answered by Saxe m who designed an 
algorithm that finds a linear arrangement of bandwidth 6 (if one exists) in time 
roughly 0(n^^^). The algorithm uses dynamic programming and is sketched 
below. 

Build a linear arrangement of bandwidth at most 6 by a extending partial 
linear arrangements (that include the first j vertices in the linear arrangement) 
by one more vertex, in all possible consistent ways. Namely, add one more vertex, 
such that no edge so far has length more than 6, and moreover, such that this 
vertex did not previously appear in the linear arrangement. This last condition 
is the tricky one - how do we know which vertices appeared in the prefix? 

To solve the above problem, we make the following two observations: 

1. A graph has bandwidth at most 6 iff each of its connected components has 
bandwidth at most 6. Hence we may w.l.o.g. assume that the graph is con- 
nected. This is a simple observation, but essential to the success of the algo- 
rithm. 
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2. Any set of consecutive vertices in a linear arrangement of bandwidth b has 
at most 2b neighbors in G. 

From a prefix of a linear arrangement of length j, we shall record only: 

— The last b vertices, which we call the “buffer” vertices. 

— Which of their 2b neighbors have not yet appeared in the linear arrangement. 
We call these the “dangling” vertices. (In the dynamic programming algo- 
rithm, maintain only those partial linear arrangements in which the number 
of neighbors of the buffer vertices is at most 2b.) 

The point is that this representation implicitly includes all vertices in the 
prefix - those are the vertices which after the removal of the dangling vertices 
still have a path to some buffer vertex (due to observation 1 above). The amount 
of information we need to record is possibilities for the buffer vertices, and at 
most 2^** possibilities for the dangling vertices (by observation 2 above), giving 
a polynomial running time when b is fixed. 

Even though the algorithm runs in polynomial time for every fixed value of 
b, its running time is not practical even for moderate values of b. The question 
then arises of whether an algorithm can be found with better dependence on 
b. For example, is there an algorithm for computing the bandwidth that runs 
in time 0(n°/(6)), where c is some fixed constant independent of b, and / is 
an arbitrary function that does not depend on n. Note that a positive answer 
would not contradict the NP-hardness of the bandwidth problem (e.g., we can 
set f{b) = 2^ and then the running time becomes exponential when the band- 
width is large). Questions of this type are studied by the theory of parameterized 
complexity, and problems having a running type of the above form are called 
fixed parameter traetable. See m and references therein. Using reductions be- 
tween parameterized NP-hard problems, a hierarchy of problems is established, 
in which problems that are hard for a certain level of the hierarchy are not fixed 
parameter tractable unless all problems lying in levels below it are. The band- 
width problem is hard for every fixed level of the fixed parameter tractability 
hierarchy |5]. 

3 Somewhat Efficient Algorithms 

NP-complete problems can be solved by exhaustive search. The running time for 
exhaustive search becomes forbiddingly large already for instances of small size. 
Sometimes, it is possible to design algorithms that are significantly faster than 
exhaustive search, though still not polynomial time. This makes the solution of 
somewhat larger size problems possible. This is often the case for problems in 
number theory such as factoring (these problems are often not NP-hard, but 
still considered untractable), where for several decades there has been gradual 
improvement in the running time. See example. These improvements 

are of great significance to cryptography (and result in the need to use larger 
numbers in cryptosystems such as RSA). 
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For NP-hard combinatorial optimization problems, the improvements in run- 
ning times are less dramatic. For problems such as 3SAT and max-CLIQUE, the 
current best running times are still 0(c"), for some 1 < c < 2 (rather than 
roughly 2" which is the running time of exhaustive search). 

For the bandwidth problem, we can exhaustively try all possible linear ar- 
rangements and choose the one with the lowest bandwidth. This has running 
time roughly n! ~ n". Can we get the running time down to c" for some c > 0 
independent of n? 

By analogy with other problems, we may hope that the answer is positive. 
Problems such as min-TSP, min-CUTWIDTH, minsum-LINEAR ARRANGE- 
MENT also require an ordering of the vertices while minimizing a certain ob- 
jective function (length of tour, maximum number of edges that cross a cut, 
sum of edge lengths, respectively). They can all be solved in time roughly 2” us- 
ing dynamic programming (one need only remember which vertices appeared in 
the prefix of a linear arrangement, but not in what order they appeared). How- 
ever, the same dynamic programming approach does not work for the bandwidth 
problem. 

Here we sketch an algorithm (from jl b) l for the bandwidth that runs in time 
c". For simplicity in this presentation, we shall assume that both n and h are 
powers of 2, and that G is connected. 

1. First phase (finds to which segment each vertex belongs). 

(a) Partition the interval [l,n] into 2n/h segments of length 6/2. 

(b) Place vertex vi in one of the segments. There are 2n/b possibilities here, 
and all of them will be tried out using exhaustive search. 

(c) Iteratively, take a yet unplaced vertex v that has a neighbor that is 
already placed, and place u in a segment that is of distance at most two 
from each one of its placed neighbors’ segments. There are at most 5 
possible segments available to v. There are n — 1 vertices to place using 
this procedure, and hence at most 5"“^ possible placements. All of them 
will be tried out using exhaustive search. 

At the end of the first phase we have at most roughly 5" arrangements of 

vertices into segments, such that at least one of them is correct. Assume 

now that the second phase is performed with a correct arrangement (and 

multiply the running time by S"). 

2. Second phase (finds exact locations within segments): 

(a) Keep only edges that connect vertices that are two segments away, as all 
other edges will have length at most 6 regardless of the internal place- 
ment of vertices within segments. The problem now decomposes to two 
independent subproblems: that of finding a linear arrangement for ver- 
tices within the even numbered segments, and that of finding a linear 
arrangement for vertices within the odd numbered segments. 

(b) For each subproblem recursively divide segments in two (of size 6/4 at 
the first step of the recursion, and sizes decrease by a factor of two with 
each new level), guess for each vertex whether to place it in the left half 
or right half of its segment, and then again partition the subproblem 
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into two independent problems, one involving the left side and the other 
involving the right side. For a subproblem involving k vertices, there are 
k guesses to make, leading to 2^ possibilities, all of which are tried out. 
Keep only those possibilities in which no edge connects subsegments that 
are at distance more than b apart. 

The number of possibilities tried out in the second phase satisfies the recur- 
sion T{k) <2^-1- 2T{k/2)) implying roughly 2"’ possibilities altogether. Hence 
the running time of the algorithm is at most 5"2" = 10" (up to polynomial fac- 
tors). The base of the exponent can be somewhat improved using more careful 
analysis. 

We note that a running time of 10" improves over n! only for values of n for 
which neither of these running times is practical. A more dramatic improvement 
in the running time would be desirable. For example, is there for every e > 0 an 
algorithm that computes the bandwidth in time 0(2'^"). We call such running 
times weakly exponential. Similar questions are being asked for other problems, 
such as 3SAT, and the answer is unknown. There is initial work on systematic 
study of the issue of weakly exponential running time. For example, one would 
like to establish for a large class of problems that either all of them have weakly 
exponential running times, or none of them do. This calls for reductions between 
instances that are linear in terms of the resulting problem size (rather than poly- 
nomial, as in the case of reductions establishing NP-hardness). The computation 
time of the reductions can be allowed to be weakly exponential. See [f23|22j for 
some interesting work in this respect. 

For the bandwidth, the original reduction showing its NP-hardness starts 
with a 3CNF formula with n variables, and ends with a graph with n'^ vertices, 
where c > 1. However, it is possible to design other polynomial time reductions 
from 3SAT to bandwidth in which the number of vertices in the resulting graph 
is 0{n) establishing that bandwidth does not have weakly exponential al- 
gorithms unless 3SAT does. 

4 Approximation Algorithms 

Due to the intractability of the bandwidth problem, one may be willing to settle 
for a polynomial time algorithm that finds a linear arrangement whose band- 
width is not optimal, but also not much larger than optimum. We say that an 
algorithm has approximation ratio p{n) if on an n node graph it produces a 
linear arrangement whose bandwidth is within a factor of at most p(n) from 
optimal. 

A known lower bound on the bandwidth is obtained via the loeal density 
bound. Let N{v, d) be the set of vertices at distance at most d from v. Then the 
local density of a graph \s D = max„_d[| A^(z), d)\/2d], and the optimal bandwidth 
b satisfies b > D. 

The algorithm with best approximation ratio known for the bandwidth prob- 
lem produces (with high probability) a linear arrangement with bandwidth 
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O (n (log n) ^ \/log n log log n) The algorithm can be interpreted in a geomet- 

ric fashion as follows. It first embeds the vertices of the graph in high dimensional 
Euclidean space while balancing two conflicting requirements: keeping the Eu- 
clidean distance between vertices not larger than their distance in the graph, and 
making the volumes of the convex hulls of subsets of vertices of logarithmic size 
as large as possible. It then projects the geometric embedding on a random line 
and outputs the vertices in order of appearance on this line. The algorithm is 
nearly practical in terms of its running time (only slightly super linear) . Despite 
having an approximation ratio that is by far superior to that of any other known 
algorithm, the algorithm fails to take advantage of easy input instances, and of- 
ten produces linear arrangements of higher bandwidth than those produced by 
other algorithms. For trees, Gupta m shows an algorithm that produces a lin- 
ear arrangement with bandwidth 0{D{logn)^^^). The analysis of this algorithm 
borrows ideas from the analysis in [El, but the algorithm itself is different, and 
is more likely to produce good linear arrangements in practice (though only for 
trees). 

The analysis of the known approximation algorithms for the bandwidth com- 
pares the bandwidth of the linear arrangement obtained to the local density of 
the graph. Such an approach is not likely to produce approximation algorithms 
with sublogarithmic approximation ratios (compared to the true bandwidth). 
There are families of graphs with local density bounded above by a univer- 
sal constant, whereas their bandwidth can be arbitrarily large |9I8| . A gap of 
l7(logn) between local density and bandwidth can be demonstrated on trees 0 
and on expander graphs. 

It is questionable whether there are polynomial time algorithms with sublog- 
arithmic approximation ratios for the bandwidth. Blache, Karpinski and Wirt- 
gen 0 showed that it is NP-hard to approximate the bandwidth within a ratio 
better than 3/2. This was later improved by Unger to every constant factor m- 
Presumably, Unger’s result can be extended to showing that the bandwidth can- 
not be approximated within a ratio that is a slowly growing function of n, unless 
3SAT has subexponential algorithms. It would be interesting to see whether this 
function comes close to log n. 

5 Heuristics 

In practice, heuristics for minimizing the bandwidth appear to work rather 
well HD). These heuristics are often based on numbering the vertices in breadth 
first search order, or on simple variations on this approach. 

A theoretical explanation for the success of heuristics is given by Turner nq. 
He studies a random graph model in which a random graph with edge probability 
p is forced to have bandwidth at most b by deleting all edges that connect 
vertices whose indices differ by more than b. Thereafter the names of vertices 
are permuted at random and the resulting graph is given as input to the heuristic. 
Turner shows that a heuristic similar to BPS recovers for most such graphs a 
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linear arrangement of bandwidth at most b + O(logn). This is almost optimal 
(when b ^ log n) as it can be shown that the bandwidth of most such graphs is 
at least b — O(logn). 

Turner assumes in his work that 0 < p < 1 is some constant independent of 
n. The case where p may depend on n was handled in m, where a modified 
algorithm is showed to produce a linear arrangement of bandwidth at most (1 + 
e)b, when p > log n/6. This modified algorithm produces two linear arrangements 
(essentially performing BPS once from the left most vertex and once from the 
right most vertex) and combines them into one linear arrangement. It is also 
shown in m that the algorithm extends to a semirandom graph model, in 
which an adversary is allowed to add arbitrary edges of its choice to the random 
graph prior to the deletion of the long edges. 

There are graphs on which the heuristics mentioned above output a linear 
arrangement with bandwidth that is a factor of f2(n/logn) larger than optimal. 
However, the quality of a heuristic is measured neither with respect to worst 
case input instances, nor with respect to best case input instances. To evaluate 
a heuristic we need a notion of an average case input instance. This is a very 
elusive concept, and there is no one particular model for average case analysis. 
Unlike the case of approximation algorithms, in which we can order the quality 
of approximation algorithms by their worst case approximation ratio, there does 
not seem to be any agreed upon way of deciding which of two heuristics give 
better results (unless one of the heuristics is better than the other on every 
instance). There is also a great difficulty of establishing negative results for 
heuristics. We describe below two methodologies for establishing limitations on 
what can be achieved using heuristics. So far, neither of them had major impact 
on the theory of heuristics, and much work remains to be done in this respect. 

Levin jZZ| (see also US!) develops a theory of average case polynomial time. In 
his framework, we associate probability distributions with problems, and sample 
input instances using this probability distribution. Then an algorithm needs to 
solve the input problem in average polynomial time, where averaging takes into 
account the probability of generating the input instance. One may argue that 
a distributional problem is hard on average if every other NP-problem with 
a polynomially sampleable distribution can be reduced to this problem. Some 
distributional problems are known to be hard under this notion, and it remains 
a challenge to use this notion to prove average case hardness results for common 
NP-problems under “natural” distributions. 

Another approach for proving hardness results is to work in a semirandom 
model in which the input instance is chosen at random and then modified (sub- 
ject to some constraints) by an adversary. There it is sometimes possible to show 
that by varying some parameter of the input instances (that controls what frac- 
tion of the input is random and what fraction is adversarial) there is a shift 
from classes of inputs that are polynomial time solvable on average to classes of 
inputs that are NP-hard on average. See |?TR] for more details. 
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6 Conclusions 



We discussed four approaches for dealing with NP-hard problems. Three of these 
approaches (parameterized versions of the problem, weakly exponential time al- 
gorithms, approximation algorithms) involve worst case performance measures, 
leading to familiar methods for evaluating and comparing the quality of algo- 
rithms. Reductions between problems is a very important tool in this respect. 
The fourth approach (heuristics) involves average case performance measures, 
and conceptual work is still needed in defining measures for quantifying the 
quality of a heuristic. 

Of course, one can study mixtures of the approaches presented above. For 
example, one can mix the first and third approach and study approximation 
algorithms for special families of graphs. Indeed, this was done for the bandwidth 
problem [21 1‘i4l2d| . As another example, one may mix the first three approaches, 
and study relations between the approximability of parameterized versions of 
problems and the existence of weakly exponential time algorithms. In H2| it is 
shown that if one can distinguish in polynomial time between graphs with cliques 
of size at most logn and graphs with cliques of size at least 21ogn, then 3SAT 
can be solved in expected time roughly 2^. 

Of the questions that remain open for the bandwidth problem, let us mention 
three: 



1. Does the bandwidth problem have considerably faster exponential time algo- 
rithms? E.g., can it be solved in time roughly 2” (rather than 10", as shown 
in P^l? 

2. Does the local density approximate the bandwidth within a logarithmic fac- 
tor (rather than polylogarithmic, as shown in ^2])? That is, is it true that 
b = 0{Dlogn). 

3. For a random graph (with constant edge probability p), remove all edges 
that connected vertices whose indices differ by more than b. The bandwidth 
of the resulting graph is at most b. Turner m shows that it is at least 
b— O(logn) (under some restrictions on the size of b). When b <C \/n/lnn, 
the bandwidth is at least b — 0(1) PO]. Is the bandwidth of these random 
graphs exactly b (with high probability)? 
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Summary. The invention of the so-called DNA sequencing more than 20 years 
ago has by now created an exponentially exploding flood of sequence data. For 
a computer scientist, such data consists of strings of symbols in an alphabet of 
size four. Being discrete by nature, the analysis and handling of sequence data is 
an exceptionally attractive and - noting its role in the heart of life - challenging 
application domain for combinatorial algorithmics. Hence it does not come as a 
surprise that computational molecular biology and bioinformatics are currently 
very active interdiciplinary research areas |?rm] . 

The algorithms for solving the so-called DNA fragment assembly problem 
and their implementations used as an integral part of the DNA sequencing pro- 
cess are one of the major successes of computational biology. The early develop- 
ments m have been followed by more sophisticated methods 0. This line of 
research culminated on the recent announcement by Celera Genomics of com- 
pleted sequencing of the entire human genome. Computational methods have 
had a crucial role in this achievment. 

Another important success of computational biology is the creation of se- 
quence databases and, in particular, the development of the very fast methods 
such as BLAST for homology searching, that is, for finding in the database the 
sequences that are approximately similar to a given sequence |2| . Such searches 
are routinely used in biological research to compare any new sequence against 
all old ones. 

The availability of entire genomes of several organisms as well as some new 
measuring instruments such as the DNA microarrays are rapidly introducing 
new computational problems. Knowing the raw DNA sequence of a genome is 
just a starting point for more refined analysis. The DNA microarrays give time- 
series data on the expression levels of the genes, i.e., how actively each gene is 
used, during different phases of the development of the organism or under exter- 
nal stress or diseases m- Analysing this rich data together with the genomic 
sequence itself opens new opportunities to trace the ’run-time behaviour’ of the 
’program’ encoded into the genome. 

For example, the following data analysis scenario can be followed: First, And 
potentially co-regulated groups of genes by clustering together the genes with 
similar expresssion level profiles. Next, pick from the genome the so-called regu- 
latory regions associated with each gene in each group. Then, search for patterns 

* A work supported by the Academy of Finland. 
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of symbols that are overrepresented in the regulatory regions for each group. Such 
patterns are potential transcription binding sites, i.e., sites in the genome where 
a protein, specific for the pattern, binds itself and regulates in this way the use 
of the gene. 

I will discuss in the talk our work 0 along this line, applied to the yeast 
{Saccharomyces cerevisiae) genome. Different clustering problems are the most 
interesting algorithmic tasks contained in this type of study. One of them, namely 
finding common patterns of symbols in a set of sequences, will be discussed in 
more detail. A quite general and flexible solution can be obtained using simple 
suffix-tree techniques |^. 

To conclude, it should be emphasized that in computational biology the typ- 
ical data is noisy and incomplete. Hence the algorithms must be robust and 
noise-tolerant, both properties that are often ignored in theoretical algorith- 
mics. Statistical considerations are also becoming more and more important. As 
a positive remark it should be noted that, perhaps unexpectedly, the genomes 
are not intolerably large. The speed and storage capacity of the basic PC is in- 
creasing rapidly. Therefore it will soon be possible to store and mine the entire 3 
billion bases long human genome in the core memory of your desktop computer. 
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Abstract. We consider dictionaries over the universe U = {0, 1}“' on 
a unit-cost RAM with word size w and a standard instruction set. We 
present a linear space deterministic dictionary with membership queries 
in time (log log and updates in time (logn)'^^^^, where n is the size 

of the set stored. This is the first such data structure to simultaneously 
achieve query time (logn)°^^^ and update time 0(2*'*°®"^ ) for a constant 
c < 1. 



1 Introduction 

Among the most fundamental data structures is the dictionary. A dictionary 
stores a subset S' of a universe U, offering membership queries of the form “x G 
S?” . The result of a membership query is either ’no’ or a piece of satellite data 
associated with x. Updates of the set are supported via insertion and deletion 
of single elements. 

Several performance measures are of interest for dictionaries: The amount of 
space used, the time needed to answer queries, and the time needed to perform 
updates. The most efficient dictionaries known depend on a source of random 
bits (are randomized, as opposed to deterministic). However, being randomized 
means that either: 1. There is a chance that the expected time bounds do not 
hold, or 2. There is a chance of the data structure returning a wrong answer. 
In some situations, this may not be acceptable. Even if their use is acceptable, 
random bits may be expensive or unavailable. Finally, an understanding of the 
power of randomization is important from a theoretical point of view. All this 
has led to an interest in derandomization of known randomized algorithms and 
data structures. Several recent papers consider deterministic dictionaries inna 
Eniraiin] However, previous space-efficient dictionaries with very fast lookups 
(time (logn)°*^^^) have had update time much larger than that of, say, binary 
search trees. Therefore these dictionaries are of interest mainly when insertions 
are quite rare compared to lookups. Our interest here lies in obtaining space- 
efficient deterministic dictionaries which combine fast updates (time (logn)®^^^) 
with very fast lookups. 
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The model of computation used is a unit-cost word RAM, in which each mem- 
ory register contains a w-bit integer (a word). This model of computation, resem- 
bling modern computers, has been the object of much recent research. Hagerup’s 
survey P| contains a description of the model. We adopt the multiplication model 
whose instruction set includes addition, bitwise boolean operations, shifts and 
multiplication. Note that all operations can also be carried out in constant time 
on arguments spanning a constant number of words. The universe considered is 
the set of machine words, U = {0, 1}’". For simplicity, we assume that each piece 
of satellite data occupies a single machine word (this could be a pointer to more 
bulky data). Throughout this paper, S will refer to a set of n elements from U. 
For notational convenience we omit the “time tag” on S, n and other symbols 
denoting dynamically changing values. All bounds will be independent of w, un- 
less explicitly stated. Note that the optimal space consumption of a dictionary 
is 0{n) words. 



1.1 Related Work 

The seminal result of Fredman, Komlos and Szemeredi |2| is that a static dictio- 
nary (i.e. without update operations) can have constant query time and linear 
space consumption. Allowing randomization, the FKS static dictionary can be 
made dynamic, supporting insertions and deletions in amortized expected con- 
stant time 0. Improving this, Dietzfelbinger and Meyer auf der Heide ^ have 
constructed a dictionary in which all operations are done in constant time with 
high probability (i.e. probability at least 1 — n“°, where c is any constant of our 
choice). A simpler dictionary with the same properties was later developed Pj. 
As for randomized dictionaries, this leaves very little to be improved. 

Without a source of random bits, the task of simultaneously achieving fast 
updates and constant query time seems considerably harder. The best determin- 
istic dictionary with constant query time supports updates in time 0(rf), for 
constant e > 0 El . In fact, a range of trade-offs between update time and query 
time is known. For query time 0{q{n)), where q{n) — 0(\/log n), update time 
0(ni/«(’^)) can be achieved m The best known result in the situation where 
update and query time are considered equally important, is O ( -\/log n / log log n ) 
time per dictionary operation. It is a dynamization of the static data structure of 
Beame and Fich [ 2 | using the exponential search trees of Andersson and Thorup 

P 

The Beame-Fich-Andersson-Thorup (BFAT) data structure in fact supports 
predecessor queries of the form “What is the largest element of S not greater 
than xT" . Its time bound improves significantly if the word length is not too 
large compared to logn. For example, if w = (logn)*^^^\ the time per operation 
is 0((loglog n)^/ log log log n). This will be a key component in our construction. 

An unpublished manuscript by Sundar PS| states an amortized lower bound 
of time ) per operation for a deterministic dictionary in Yao’s cell 

'' log log log.j^ n -j 

probe model HSI, which in particular implies the same lower bound on the word 
RAM. Note that for w = (logn)'^^^\ the BFAT data structure has time per 
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Fig. 1. Overview of deterministic dictionaries using linear space. 



operation polynomially related to the lower bound. The challenge therefore seems 
to be finding ways of dealing with large word length. 

1.2 This Work 

In this paper we obtain a dictionary with query time 0((log log nf' j log log log n) 
and amortized update time 0((logn)^) (we sketch how to make the latter bound 
worst-case). We deal with the problem of large word lengths by devising a dy- 
namic universe reduction scheme, which reduces the problem to one within a 
smaller universe, which is then handled by the BFAT data structure. An in- 
teresting aspect of the reduction is that queries for the same element at two 
consecutive points in time usually translate to different BFAT queries. In par- 
ticular, it is crucial that the BFAT data structure answers predecessor queries, 
and not just membership queries. 

Our data structure is the first deterministic dictionary to simultaneously 
achieve query time (logn)°*-^^ and update time 0(2^*°®"^ ) for a constant c < 1. 
The data structure is weakly non-uniform in that it needs access to a fixed 
number of word-size constants depending (only) on w. These constants may be 
thought of as computed at “compile time” . 

In the following we assume that w > (logn)®. Smaller word sizes can be han- 
dled using the BFAT data structure directly, and standard rebuilding techniques 
can be used to change from one data structure to the other. Similarly, we as- 
sume that n is larger than some fixed, sufficiently large constant, since constant 
size dictionaries are trivial to handle. We will look at machine words as binary 
numbers, with the most significant bits on the left and the least significant bits 
on the right. Bit positions are numbered from right to left, starting with zero. 

2 Universe Reduction 

Miltersen HH has shown the utility of error-correcting codes to deterministic 
universe reduction. A universe reduction function p : U ^ U' translates the 
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dictionary problem from universe U to the reduced universe U' (a search for x 
becomes a search for p{x)). The advantage of this is that U' may be smaller and 
easier to handle. Previous universe reduction functions for the static dictionary 
problem EEUig have been 1-1 on S. In the dynamic case this appears hard to 
combine with efficient updates, and in our construction the reduction function is 
0(logn)-l. That is, O(logn) elements of S may translate into the same element 
p{x). A search among the elements “attached to /o(a;)” is then needed to establish 
whether x € S. 

2.1 Error- Correcting Codes and Distinguishing Bits 

Miltersen’s approach plays a key role in our construction, so we review it here. 
The basic idea is to employ an error-correcting code e : {0, 1}“' — >■ {0, l}^’" and 
look at the dictionary problem for the transformed set {e{x) \ x £ S}. For this it 
is possible to find a very simple function which is 1-1 on S, namely a projection 
onto O(logn) bit positions. 

The code must have relative minimum distance bounded from 0 by a fixed 
positive constant, that is, there must exist a constant a > 0 such that any two 
distinct codewords e{x) and e{y) have Hamming distance at least a ■ 4w (the 
supremum of such constants is called the relative minimum distance of the code) . 
We can look at the transformed set without loss of generality, since Miltersen 
has shown that such an error-correcting code can be computed in constant time 
using multiplication: e{x) = ■ x, for suitable c^, G {0,1}^“’. The choice of 

is a source of weak non-uniformity. The relative minimum distance for this code 
is greater than 1/11. In the following, a will denote a constant strictly smaller 
than the relative minimum distance of the error-correcting code (e.g. a = 1/11). 

Lemma 1. (Miltersen) For any R Q U x U there exists a discriminating bit 
position i £ {0, ...,4?n— 1} such that \{{x,y) £ i? | x yf y, e{x)i = e{y)i}\ < 
(l-a)\R\. 

Corollary 2. (Miltersen) Let T he a set of m elements. There exists a set of 
distinguishing bit positions DC {0, . . . , 4w — 1} with \D\ < £ log m such that for 
all pairs of distinct elements x,y € S, there is i € D where e{x)i yf e.{y)i. The set 
D can be constructed deterministically in time O(mlogm), given a deterministic 
0(m) time algorithm for finding a discriminating bit from the equivalence classes 
of an equivalence relation over T. 

Proof sketch. Elements of D may be found one by one, as discriminating bits 
of the equivalence relation where x,y G T are equal iff e(x) and e{y) do not 
differ on the bit positions already chosen. The number of pairs not distinguished 
decreases exponentially with the number of bit positions chosen. □ 

Miltersen’s universe reduction function is simply x i— > e(x) AND d, where 
AND denotes bitwise conjunction and d is the incidence vector of D. The reduced 
universe U' consists of the 4ru-bit vectors which are zero outside the positions 
given by D. 
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Two problems remain: 1. We must show how to find discriminating bit posi- 
tions in time 0(m). 2. We want the reduction function to map to O(logm) con- 
secutive bits, that is, to {0, xhe first problem was solved by Hagerup 

im. We need the following slight extension of his result to also solve the second 
problem: 

Lemma 3. (Hagerup) Given a set T of m elements, divided into equivalence 
classes, a discriminating bit position i can be found in time 0{m) by a deter- 
ministic, weakly non-uniform algorithm. Further, for any set I C {0, . . . , 4w— 1} 
of size 0((logn)'^) (given as a bit vector), we can assure that i ^ I. 

Proof sketch. It can be shown how to compute \{{x,y} CT\x^y,x = 
y, e(x)i = e{y)i}\ for all i € {0, . . . , 4w— 1} in time 0{m). The algorithm employs 
word-level parallelism, and the result vector spans O(logm) words, since each 
number occupies O(logm) bits. Word-parallel binary search can be used to find 
the smallest entry. To avoid entries in /, simply overwrite the entries of I with 
the largest possible integer before finding the minimum. This corresponds to 
changing the error-correcting code to be constant (i.e. non-discriminating) on 
the bit positions of /. Since |/| = ©((logn)"^) and the length of codewords is 
Aw > 4(logn)^, the relative minimum distance of this modified code is still 
> a (for n large enough). Hence, this procedure will find a discriminating bit 
position. □ 

2.2 Multiple Set Universe Reduction 

To accommodate efficient updates, we will not maintain a set of distinguishing bit 
positions for S itself. Instead, we maintain k = |"log(n -|- I)] sets of distinguishing 
bit positions Dq, . . . , Uf._i for subsets S'q, . . . , Sk-i whose (disjoint) union is S 
and where |5i| G {0,2®}. By the results of Sect. f2. II we can achieve \Di\ = 0{i), 
and recomputation of Di when Si changes takes time 0(2® i). Additionally, we 
can make the complete set of distinguishing bit positions well separated, that 
is, no pair of positions differ by less than 2c (log n)^, where c is a suitably large 
constant. 

Since the distinguishing bit positions are well separated, we are able to “col- 
lect” and order the distinguishing bits within 0((logn)^) consecutive bit po- 
sitions, such that the distinguishing bits of Sq are least significant, and the 
distinguishing bits of Sk-i are most significant. For each empty set Si we will 
have a number of zero-bits. The following lemma makes this precise. 

Lemma 4. Given a list d\, ... ,dp of well separated bit positions, where p < 
c (log n)^, there is a function /j : (0, 1}®'^®" i— >• {0, 1}?® such that for any x, fd{x)i = 
Xdi . The function can be evaluated in constant time, and updated under changes 
of bit positions in constant time. 

Proof. We will show how to “move” bit di oix G (0, 1}"*®" to bit u-\-i of a u-|-p-bit 
string, where u > max^ di (the desired value can then be obtained by shifting the 
word by u bits). We simply multiply x by = J2i 2®®+®“®^® (a method adopted 
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from 0 P- 428-429]). One can think of the multiplication as p shifted versions 
of X being added. Note that if there are no carries in this addition, we do indeed 
get the right bits moved tou-|-l,...,u-|-p-|-l. However, since the bit positions 
are well separated, all carries occur either left of the u + pth position (which 
is harmless) or right of position u — p (which can never influence the values at 
positions greater than u, since there are more than enough zeros in between to 
swallow all carries). Note that can be updated in constant time when a bit 
position changes. □ 

We are now ready to describe how to update the dynamic universe reduction 
function under updates. New elements are inserted in the lowest numbered empty 
set Si together with the elements of 5'o> • ■ ■ ) 5'i-i (these sets are then “emptied”). 
Note that the work per element when constructing a new set of distinguishing 
positions is O(logn). Since elements are always transferred to higher numbered 
sets, the total amortized work for an insertion is 0{k\ogn) = 0((logn)^). As we 
will see in the next section, this cost will be dominant in the cost of an insertion 
in the final dictionary. 

The universe reduction function will not be updated during deletions. Rather, 
deletions are implemented by simply marking deleted elements in the dictionary. 
When more than half of the elements in the dictionary are marked, a new dic- 
tionary containing the unmarked elements is constructed. The cost of this is 
amortized over the deletions, which hence also have cost 0((logn)^). 

3 Using the Predecessor Data Structure 

Recall that our universe reduction function, which we will call p, computes the 
concatenation of functions fk, ■ ■ ■ , fo which are 1-1 on Sk, ■ ■ ■ , Sq, respectively. 
The value p{x) after x is inserted in Si is used as key for x in the BFAT pre- 
decessor data structure. Functions /o, . . . , fi-i return zero vectors at this time. 
However, these functions will change in the period until the next update of Si, 
and specifically fo{x), . . . , /i_i(x) may change. When a search for p(x) is con- 
ducted, the result will be either the BFAT key for x, or that of a key y later 
inserted, whose BFAT key agrees with that of x except possibly for some of 
the values of fo, . . . , fi-i. In this case we want x to be present in y’s associ- 
ated (sorted) list of elements. That is, for each new key p{y) in the BFAT data 
structure, we want a list of elements which includes a; G S'i iff 3; and y agree on 
fk, ■ ■ ■ , fi- 

A predecessor query on p{y) — 1 will return the BFAT key which has the 
longest common prefix with y (if any). By invariant, the associated list of this 
key contains all the elements needed, apart from y itself, so it is easy to create 
the list associated with y. The crux is that, since fk, ■ ■ ■ , fo are 1-1, an associated 
list can contain at most one element from each set. 

Example. We go through Fig. Q This example has 3, 4 and 5 distinguishing 
bit positions for ^o, and S 2 , respectively. The keys inserted in the BFAT 
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data structure are annotated with their list of elements. At t = 4 the dictionary 
contains four elements, denoted a, b, c, d, all residing in 5'2- At t = 5 element e is 
inserted and put into Sq. The key for e coincides with the key for c on the first 
five bits, so the associated list contains c and e. A search for the key of c at this 
time would in fact find 00111 0000 000, so c is not strictly necessary in the new 
list. However, at t = 6 element / enters, and Si is filled by e and /. After this, 
a search for the key of c will find 00111 0010 000, and c can be found in the new 
list. At t = 7 element g is inserted, and its key coincides with both the first five 
bits of c’s key and the first nine bits of e’s key, so the associated list becomes 
ceg. 



t =4 


t =5 


t =6 


t =7 


00010 0000 000 a 


00010 0000 100 


00010 0001 000 


00010 0001 010 


001 10 0000 000 b 


00110 0000 no 


001 10 1000 000 


00110 1000 000 


00111 0000 000 c 


00111 0000 001 


00111 1100 000 


00111 0101 on 


11011 0000 000 d 


11011 0000 no 


non 0010 000 


non 0010 001 






00111 0010 000 ce 


00111 0010 101 






11111 1101 000 f 


11111 1101 100 




00111 0000 011 ce 




10111 0010011 



Fig. 2. Universe reduction function values for elements in S during three insertions. 



3.1 Time and Space 

A search for x requires computation of p(x) in constant time, a predecessor 
lookup in time O ( (log log n)^/ log log log n) and finally search of an associated 
list in time O(loglogn). That is, the total time is O ( (log log n)^/ log log log n). 

As for insertions, we already argued that the amortized cost of maintaining 
the universe reduction function is 0((logn)^), so we only need to see that the 
cost of maintaining the associated lists is no larger. This is not hard, since all 
that is needed is a single predecessor query and insertion of an element in a 
sorted list of length O(logn). 

The only part of the data structure which is not clearly in linear space is 
the set of associated lists, where elements may occur logn times. To see that 
their total length is 0(n), note that there can be no more than n/2®“^ lists of 
length z, since such lists must have been created in connection with insertion of 
elements in So, , Sk+i-i- 
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4 Final Remarks 

4.1 Speedups 

Updates can be sped up slightly, to time 0((logn)^/loglogn), by using another 
strategy, in which there are 6>(log n) sets of each size, and only 0(log n/ log log n) 
different set sizes. If the requirement of linear space is abandoned, substituting 
van Emde Boas trees m for the BEAT data structure gives membership queries 
in time O(loglogn). The space usage then rises to 

It can be noted that the predecessor data structure is used in such a way that 
it essentially answers “longest common prefix” queries on strings of length A: -1-1, 
where the characters are described by the bits corresponding to sets Sk, - ■ ■ ,Sq, 
respectively. A plausible way of improving the query time to, say, O(loglogn) is 
by designing a faster data structure which can find such longest common prefixes. 

4.2 Worst-Case Bounds 

We gave amortized bounds. The same worst-case bounds follow by standard 
lazy rebuilding techniques, to be sketched below. Where the amortized insertion 
algorithm would “build” Si and empty Si-i, . . . , Sq, the worst-case insertion 
algorithm keeps 5i_i, . . . , S'o in memory and starts building Si at a pace of 
c log n steps per insertion (for some sufficiently large constant c) . Only when Si 
is completed, we throw out the lower numbered sets. 

More precisely, we now have sets Si^j for 0 < j < * < /c, where G 

{0, 2^}. The first index signifies that Sij will next become part of a new set of 
size 2®. Consider insertion number 2^d — 2“, where a < b (any positive integer 
can be written like this for unique integers a, b and d). At this point we start 
constructing Sb,a from the new element and Sa,o, • . • , Sa,a-i- As the last stage of 
the construction, we set Sa,o = ■ ■ ■ = Sa,a-i = 0- Constant c above can be chosen 
such that this is guaranteed to be finished before any of the sets Sa,o, ■ ■ ■ , <S'a,a-i 
are to be reconstructed. The ordering of distinguishing bits is with respect to 
primarily the first index, secondarily the second index. 

Since we need associated element lists of length l7((logn)^), we cannot afford 
to use sorted lists as before (updates would become more expensive). Instead, 
we use persistent balanced search trees p], which support updates and queries 
in time O(logt) for a sequence of trees of size at most t. One technicality is that 
many instances of the algorithm finding distinguishing bits have to run at the 
same time and must produce well separated bit positions. However, since posi- 
tions are chosen one by one, this poses no problem. In addition to what is done 
in the amortized case, the worst-case deletion algorithm inserts two elements of 
S' in a new dictionary. When the transfer of all elements in S is completed, the 
new dictionary takes the place of the old one. Of course, transferred elements 
may be deleted before the new dictionary takes over. 
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5 Conclusion 

We have seen a new lookup time vs insertion time trade-off for linear space de- 
terministic dictionaries. This presents progress towards closing the gap between 
known upper and lower bounds. It also shows that universe reduction techniques 
have a place not only in the static setting. 

The big open question is whether updates in such a dictionary can be accom- 
modated in time (logn)°^^^. For example, time (log log would mean that 

Sundar’s lower bound is tight up to a polynomial. For w = (logn)*^^^^ this is 
achieved by the BFAT data structure. Thus, large word length seems to be the 
main enemy, and new universe reduction schemes with faster updates appear a 
promising approach. 
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Abstract. Pairing heaps are shown to have constant amortized time in- 
sert and zero amortized time meld, thus improving the previous O(logn) 
amortized time bound on these operations. It is also shown that pairing 
heaps have a distribution sensitive behavior whereby the cost to per- 
form an extract-min on an element x is 0(log min(n, fc)) where k is the 
number of heap operations performed since x's insertion. Fredman has 
observed that pairing heaps can be used to merge sorted lists of varying 
sized optimally, within constant factors. Utilizing the distribution sensi- 
tive behavior of pairing heap, an alternative method the employs pairing 
heaps for optimal list merging is derived. 



1 Introduction 

Self adjusting data structures, through the use of simple update rules, are often 
able to match the asymptotic performance of non-self adjusting data structures 
over any sequence of operations. They do not store balance information and 
thus require less memory. Self adjusting structures are relatively easy to code 
and often empirically outperform their non-self adjusting counterparts. Some self 
adjusting data structures asymptotically perform as well as off-line algorithms 
on classes of execution sequences defined by various structural or distributional 
characteristics. Splay trees 0, a self adjusting binary search tree, have all of these 
qualities, and are clearly favorable over their non-self adjusting counterparts, 
both theoretically and empirically, in many situations. However, with respect 
to heap design, the self adjusting methodology has not achieved corresponding 
success. For pairing heaps, one form of self adjusting heap, we partially rectify 
this by asymptotically improving existing upper bounds to bring them closer 
to the best non-self adjusting data structure as well as introducing distribution 
sensitive upper bounds. 

The leading non-self adjusting heap is the Fibonacci heap |7] which has con- 
stant amortized time make-heap, insert, find-min, meld, and decrease-key, while 
supporting delete and extract-min in O(logn) amortized time. A recent alter- 
native to Fibonacci heaps due to Kaplan and Tarjan, thin heaps 0, lowers 
the pointer and balance requirements, but remains cumbersome. It was conjec- 
tured in [0| and empirical evidence was presented in Stasko and Vitter m that 
pairing heaps share the same amortized cost per operation as Fibonacci heaps. 

* Research supported by NSF grant CCR-9732689. 
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However, this possibility was eliminated when it was shown by Fredman 0 that 
the amortized cost of decrease-key can not be below O(loglogn). In this work 
we present a tighter analysis of pairing heaps than found in that proves, with 
the exception of decrease-key operations, pairing heaps share the same asymp- 
totic runtime per operation as Fibonacci heaps. Specifically, the amortized upper 
bound of O(lo^) for the insert and meld operations, is improved to 0(1) for 
insert and O(0)Qfor meld. It should be noted that Stasko and Vitter in intro- 
duced a variant of pairing heaps, the auxiliary twopass method, and proved that 
this structure supported constant time insert. However, their analysis explicitly 
forbade the decrease-key operation. 

Pairing heaps use a restructuring heuristic that bears a strong similarity to 
that of splay trees. While the ability of splay trees to exhibit certain types of 
distribution sensitive optimality has been extensively studied such 

behavior, while expected in pairing heaps, has never been demonstrated. We 
prove one result, similar to the working set theorem for splay trees jO|, that 
implies that if an item is in a pairing heap of maximum size n and k heap 
operations have been performed since its insertion, extract-min operations take 
amortized time 0(logmin(n, k)). This result holds for several variants of pairing 
heap and through the depletion transformation of Fredman for top down 
skew heaps m as well. Our results are more robust than some of the results 
on splay trees, as we allow the heap size and contents to dynamically change, 
as opposed to the analyses of splay trees which only study the access operation 
when investigating distribution sensitive effects. 

Fredman [3| has shown that n sorted lists of varying sizes can be optimally 
merged (within constant factors) using pairing heaps in the following manner: 
First, each sorted list is represented as a linked listed, viewed as a linearly struc- 
tured heap-ordered tree. These trees are then combined into a single tree using n 
pairing heap meld operations. Finally, a single sorted list is obtained by executing 
repeated extract-min operations. Inspired by Fredman’s results, an alternative 
approach to list merging proceeds by inserting the smallest element from each 
list into a pairing heap, and then repeatedly executing the pattern: extract-min, 
insert] where each insertion involves the next element from the input list that 
contains the previously extracted element. An application of the 0(logmin(n, k)) 
result shows that this approach to list merging achieves optimal performance, 
within constant factors, in both pairing and top-down skew heaps. 

2 Pairing Heaps 

A pairing heap is a heap ordered general tree. The basic operation on a pairing 
heap is the pairing operation, which combines two pairing heaps by attaching 
the root with the larger key value to the other root as its leftmost child. Priority 

^ Meld in Fibonacci heaps is typically stated as taking 0(1) amortized time. However, 
since meld operations must be dominated by make-heap operations, meld operations 
can never asymptotically change the runtime of any sequence, and thus take 0(0) 
amortized time. 
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queue operations are implemented in a pairing heap as follows: Make-heap creates 
a new single node heap. Find-min returns the data in the root of the heap. Merge 
pairs the roots of the two heaps. Insert pairs the new node with the root of heap. 
Decrease-key breaks off the node and its subtree from the heap (if the node is 
not the root), decreases the key value, and then pairs it with the root of the 
heap. Delete breaks off the node to be deleted and its subtree, performs an 
an extract-min on the subtree, and pairs the resultant tree to the root of the 
heap. Extract-min removes and returns the root, and then, in pairs, pairs the 
remaining trees. Then, the remaining trees from right to left are incrementally 
paired. All pairing heap operations take constant actual time, except extract- 
min and delete, which take time linear in the number of children of the node 
to be removed. For the purposes of implementation, pairing heaps are stored 
as a binary tree using the leftmost child, right sibling correspondence. Unless 
otherwise stated, the standard tree terminology will refer to the general tree 
representation. 

3 Constant Amortized Time Insert and Zero Amortized 
Time Meld in Pairing Heaps 

We claim that in a pairing heap the amortized runtime of find-min, make-heap, 
and insert is 0(1), meld is 0(0) and decrease-key, delete and extract-min is 
O(logn). The n used in the analysis is number of items in the heap that will 
be removed during execution sequence in question, rather than simply the total 
number of items in the heap. Proving these amortized costs is equivalent to 
proving the following statement: 

Theorem Given a sequence S' = Si . . . of m operations, where D = 
{t|si is a extract-min, decrease-key or delete operation}, C = {i|si is a find- 
min, make-heap, or insert operation}, and Ui is the size of the heap before the 
execution of Si, S can be executed on an initially empty forest of paring heaps in 
time 0(|C| + logUj). Note that meld operations are allowed but can never 
asymptotically affect the runtime of any sequence; that is the meaning of 0(0) 
amortized time. 

Proof The potential method is used. The amortized time of operation i, di, is 
defined to be the actual time of the operation, a^, plus the change in potential, 
<Pi — <Pi-i. Summing over the sequence S and rearranging yields: = 

di-\-<Po — d>rn ■ Thus, the actual runtime of a sequence of operations is equal 
to the sum of the amortized time of the operations plus the net loss of potential. 
Note that the amortized times that we prove below are different then the ones 
stated above; We prove these amortized times to bound the runtime of the entire 
sequence, and that in turn proves our originally stated amortized costs. 

For the analysis, a color, black or white, is assigned to every node, and a 
weight is assigned to those nodes colored white. A node is black if it will remain 
in the forest of heaps at the end of execution of sequence S, and white otherwise. 
We say that a white node is heavy if the number of white nodes in its left subtree 
in the binary representation is greater than or equal to the number of white 
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nodes in its right subtree. Roots and leaves are always heavy by this definition 
and every node can have a maximum of log n heavy children. We say that a 
white node that is not heavy is light. 

We say a white node has been captured if its parent is black. Captured 
nodes must have either a decrease-key or a delete performed on them later in 
the execution sequence, and until such time they are not involved in any pairings. 

The potential of a white node is the sum of four components: Rank poten- 
tial, weight potential, triple white potential, and capture potential. The rank 
potential of a white node is the 9 times the logarithm of the number of white 
nodes in its induced subtree in the binary representation. If node is white and 
has right and left siblings that are also white, then the node has a triple white 
potential of —6, else it has no triple white potential. Heavy nodes have a weight 
potential of —6, and light nodes have no weight potential. We assign captured 
nodes a capture potential of —6 and non-captured nodes no capture potential. 
The potential of a black node is —6 if its parent, in the general representation, 
is black, and 0 otherwise. The potential of a forest of heaps, (p, is the sum of the 
potentials of the nodes in the heaps. 

The amortized cost of each operation is now calculated using this potential 
function: 

Amortized cost of create-heap is 0(1): Actual cost is 1, and the newly inserted 
node has at most 0 potential (This is true if it is white or black). Thus the 
amortized cost is at most 1. 

Amortized cost of Insert is O(logn), if the newly inserted node is white: 
Actual cost is 1. The newly inserted node will have a potential at most 91ogn. 
The old root is the only other node that will change potential, gaining at most 
91ogn. Thus the amortized cost of this operation, which is the sum of the actual 
cost and the change in potential, is 1-1-18 log n. 

Amortized cost of Insert is 0(1), if the newly inserted node is black: Actual 
cost is 1, and there are no gains of potential, for an amortized cost of 1. 

Amortized cost of Meld on two trees with black roots, or on a white tree 
with a white root, and the root of a heap composed entirely of black nodes is 
0(1): Actual cost is 1, and there are no potential increases. Thus the amortized 
cost is 1. 

Amortized cost of Meld on two heaps with white roots or one heap with 
a white root and one heap with a black node that is the root of a heap that 
contains at least one captured white node is O(logn): Actual cost is 1. At most 
two nodes, the roots of the trees to be melded, increase potential. Both could 
gain rank potential, up to 9 log n each. Also the root with smaller key value 
could change from heavy to light, causing a gain of 6. Thus the amortized cost 
is 7 -I- 18 log n 

Amortized cost of decrease-key is 0(log n) Actual cost is 1. The node on which 
the decrease-key is performed could gain as much as 9 log n in rank potential, 6 
in weight potential, and 6 in capture potential. Among the node on which the 
decrease-key is performed, and its two former siblings to the left and right, a 
total of 6 units of triple white potential can be gained. Also, on the path from 
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the node on which the decrease-key is to be performed to the root, the removal 
of the node and its subtree may cause some nodes to change their status from 
light to heavy or vice versa. Only changing from heavy to light causes a potential 
gain, and this gain of 6 can only happen in log n nodes. Thus the amortized cost 
is 19 + 15 log n. 

Amortized cost of extract-min is O(logn): 

If there are c children of the root, the actual cost is c — 1. The removal of 
the root itself causes a potential gain of at most 6, since it was heavy, and is 
not captured or a triple- white. There are at most logn heavy children of the 
root, and so the potential gain caused by heavy nodes becoming light is at most 
6 logn. Given that there are w white-white pairings in the first pairing pass, 
the first pairing pass causes a rank potential gain of at most 18 logn — ISrc. 
The second pairing pass causes a rank potential gain of at most 9 logn. The 
derivation of the changes in rank potential may be found in pj . 

The extract-min operation can cause no increase in capture potential, as none 
of the children of the are be captured. 

In order to analyze other changes in potential (changes in triple white poten- 
tial, losses of weight potential caused by a node becoming heavy, and changes in 
black nodes’ potential) we break the children of the root into blocks of six nodes, 
excluding the rightmost two nodes. At most 7 nodes can not be included in this 
analysis, and they could incur a potential gain of up to 12 each. In analyzing 
these specific potential changes in each block of six, there are six cases. 

Case 1: There is at least one white- white pairing in the first pairing pass. 

The only gains in potential are the possible gain of 6 units of triple white 
potential for each white involved in a white-white pairing. This is the only case 
where a gain in potential, among the components of the potential function under 
consideration is possible. 

Case 2: There are no white- white pairings and at least one black-black pairing 
in the first pairing pass. 

The black-black pairing (s) causes a loss of at least 6 units of potential. 

Case 3: All are black-white pairings in the first pairing pass, and at least one 
of the three white nodes is captured. 

The capturing of the node(s) causes a capture potential loss of at least 6. 

Case 4: All are black- white pairings in the first pairing pass, but all three 
nodes that participate in the second pairing pass lose. 

Having all three loose the pairings in the second pairing pass causes a loss of 
potential of 6, due to the change of status of the middle white node to a triple 
white. 

Case 5: All are black-white pairings in the first pairing pass, and at least one 
of the three nodes that participate in the second pairing pass wins, and at least 
one of the white nodes is light. 

The light node becomes heavy, as all nodes previously on its right are now 
in its subtree. Additional nodes that were to the node’s left may also be added 
to its subtree, but this just makes it more heavy. This causes a loss of 6 units of 
heavy potential. 
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Case 6: All are black- white pairings in the first pairing pass, and at least one 
of the nodes that participates in the second pairing pass wins, and all of the 
whites that win in the second pairing pass are heavy. 

There is no potential gain. This case can only happen log n times, because 
there are at most logn heavy children of the root. 

Case 1 causes a potential gain of at most I2w, and cases 2-6 cause a potential 
loss of at least — 6([^^J — w — logn). These potential changes are in addition 
to the gain of at most 33 logn — ISru discussed earlier. Thus summing the actual 
cost of the extract-min operation, c — 1 with the maximum potential gain yields 
an amortized cost of 89 -I- 39 log n. 

The amortized cost of delete is O(logn): If the node to be deleted has c 
children, the actual cost to delete a node is c — 1. The removal of the node and 
its subtree can cause a weight potential gain of 6 logn in its ancestors. Among 
the node on which the delete is performed, and its two former siblings to the left 
and right, a total of 6 units of triple white potential can be gained. Performing 
an extract-min on the newly extracted subtree causes a potential gain of at most 
89 -I- 39 logn — c (See analysis of extract-min above). Pairing the resultant tree 
with the root of the original tree causes a potential gain of at most 6-1-18 log n 
(see analysis of meld) . Thus the total potential gain is at most 101 -I- 63 log n — c 
and the amortized cost is 100-1- 63 logn 

The total potential loss over the execution of the sequence S, — <Pjn, is 
at most — 6|C|: The initial potential of the empty data structure is 0 potential 
is zero. At the end of the execution sequence, the data structure is a forest of 
black nodes. Each non-root node has a potential of —6. Since \C\ is at least the 
total number of nodes inserted into the structure, the total potential loss is at 
most —61(71. 

The sum of the amortized cost to perform the extract-min, decrease-key and 
delete operations is (9(Aig£) log n^). It can also been seen that the sum of the 
(7(log n) amortized costs to insert white nodes and to perform meld operations 
on two heaps where the root of one of the heaps is white and the other heap 
contains at least one white node can not exceed O {S log Ui). Thus the sum 
of the amortized costs of all operations except create-heap, insert on a black 
node, and meld where both roots are black or at least one of the two heaps is 
entirely composed of black nodes is 0{Si^o logrii). There can not be more than 
\C\ insertions of black nodes, meld operations where both roots are black or at 
least one of the two heaps is entirely composed of black nodes, or create-heap 
operations, all of which we have shown take constant amortized time. Therefore, 
the sum of the amortized times of all of the operations is at most OdCj -I- 
logUi). Adding the maximum potential drop of 0{\C\) to the sum of the 
amortized costs yields a bound of 0{\C\ -\- AjgD logrij) on the actual runtime of 
the execution sequence S. 
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4 The Working Set Theorem 

Terminology: Let ti{x) = 1+ the number of items in the heap at time i that 
were not in the heap when x was inserted^ If time i is before x was inserted, 
ti{x) = 0. Let Ti{x) = maxt^j^ Furthermore, assuming x was inserted 

at time a and the maximum size of the heap from time a to 6 is n, Th{x) < 
min(6 — a, n). The value of Ti{x) is nondecreasing in i. We use rii to denote the 
current number of nodes in the heap and r]i to denote the maximum value of 
Tj{x) among all nodes x, among all times j up to the and including i. Note that 
rji is also equal to the maximum size the of the heap up to and including time 
i. In this section the standard tree terminology refers exclusively to the binary 
representation. 

Working Set Theorem for Pairing Heaps: 

Let A — ai . . . am be a sequence of m insert and extract-min operations 
performed on an initially empty heap. Let J = {i|ai is a insert operation } and 
E = {i\tti \s & extract min operation }. Let denote the root at time z; Thus if 
e G E, then Xg is the item extracted at time e. The time to sequentially perform 
the operations in 4. on a pairing heap is: 

O I Um log E logTi(r,) 

V i£E 



Proof: 

The potential method is used to analyze these operations. In order to analyze 
the pairing heap, we will introduce the notion of a dummy node. Each node will 
be inserted with a dummy node, and when extracted, the dummy node will 
either be extracted with it, or on the subsequent operation, if the subsequent 
operation is an insert. The dummy node of x, denoted as d{x) will be located 
as x’s rightmost child. The position of the dummy nodes is invariant under the 
pairing operation. The dummy nodes are for analysis purposes only, they are 
not stored in an implementation. In the node counting required to compute the 
t function defined above, we do not count dummy nodes. We instead adopt the 
convention that ti(x) = ti{d{x)). 

Lemma For any fixed time i, there can never exist two nodes x and y such 
that Ti{x) = Ti{y), unless one is the dummy node of the other. 

Proof Assume x was inserted before y. Thus at any time j when both x and 
y are in the heap tj (x) > tj (y) . This is true because every node inserted after y 
that is still in the heap is a node that was inserted after x that was still in the 
heap, and y itself contributes 1 to tj{x) but not to tj{y). 



^ Note that time i refers to the state of the data structure before the ith operation 
has completed. Time zero refers to the initial empty state of the data structure. 
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Definitions 

/(^) = 2(TTSF 

Si{x) = {y\y is in the subtree induced 
by X at time i } 

Ti = the root 

£i = 2pi_i(ri_i) 

if i — 1 G i? and i G I, 0 otherwise 
ri{x) = T,y(^S4x) f(Ti{y)) 

Pi{x) = logri(x) 

= £i + EyeSdn) 

The amortized time of operation i, di, is defined to be the actual time of the 
operation, Oj, plus the change in potential, Summing and rearranging 

terms yields Qi = ~ ^m+i Thus the actual runtime of a 

sequence of operations is equal to the sum of the amortized time of the operations 
plus the net loss of potential. To prove the working set theorem, it shall be 
sufficient to prove the following three lemmas: 

Lemma (Potential loss): <Pl-^N = 0{n^l0g7]rn) 

Lemma (insert): If i G I, di = 0. 

Lemma (extract-min): If i G E, di = 0(log(Ti(ri))) 

Proof (Potential loss lemma): 



<^1 - <^m+l 

<ei-e™+i+ ^ pi(x) 

xeSi(ri) 

Pm+l{x) 

x£Sm.+ l(rm + l) 

<- X! logrm+i(a;) 
xeSm+lir^+l) 

< -n™+i(-21og(l + ym+i) - 1) 

= 0(nm log 77m) 

Proof (Extract-min Lemma): We track the change in potential over every 
step involved in the extract-min operation: The removal of the root, the first 
pairing pass, the second pairing pass, the removal of the dummy node, and the 
changing of e, T(x), and n. We assume the current operation is Oi-- 

Removal of root: The removal of the root causes a change in potential of 

-pATt) = -\ogTr(rr) < 1 + 21og(l + Tr(7’r)) 

First Pairing Pass: The effect on the binary representation of the first 
paring pass may be viewed as a sequence of applications of the double pairing 
transformation illustrated in Figure 1, along with at most one application of a 
single pairing transformation from Figure 2. 
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Before Pairing After pairing, Case 1 After pairing, Case 2 




After pairing. Case 3 After pairing. Case 4 



Fig. 1. Effect of first pairing pass on two pairs of nodes 



Lemma: A single application of a double pairing transformation causes a 
potential gain of at most Apr{A) — 4 pt{B). 

Proof: The proof for the first case is provided, the remaining three are 
substantially similar. 



Pt(A') +Pt{B') + Pt{C') + Pr(D') 

-Pr{A) -Pr{B) -pr{C) ~ Pr{D) 

= log rr (A') + log Tr (O') + log rr (£>') 

-logrr{B) - log r^(C) -logrr{D) 

= log Vr {A') + log rr{D) + log 4 
+ log r^(C") + log rr(D') 

- log r, (B) - log Tr (C) - 2 log r, (4?) - 2 

< log(4r^(A')r^(T»)) + 21ogr^(A) 

— 41ogrT-(I?) — 2 

< 21og(r^(A') + rr{D)) 

+2 log Tr (A) - 4 log Tr (D) - 2 

< 41ogrT-(A) — 41ogrT-(D) — 2 

< 4pr{A) - 4pr{D) - 2 
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Lemma: A single application of a single pairing transformation from Figure 2 
causes a potential gain of at most logpT-(A)— logpT-(B) < 41ogpT-(A)— 41ogp,-(i?) 

Proof: Again, the proof for the first case is provided, with the second case 
being similar: 



Pr(A') -Pr(B') -pr{A) ~ Pr{B) 
= Pt{B') -pr{B) 

< Pr{A) -pr{B) 

< Apr{A) -Apr{B) 



If the number of parings required is even, then the first pairing pass is ana- 
lyzed by repeatedly applying double transformations. If the number of pairings 
required is odd, the pairings are carried out by using one single-pairing transfor- 
mation and several double-pairing transformations. Note that d(r,-), which will 
be at the bottom of the right path, does not participate in the transformations, 
as it is not actually stored in the heap. Let I be the number of pairing transfor- 
mations performed. All of the transformations applied form an telescoping sum 
with the result that the total potential gain is at most 



Apr{L{rr)) - Apr{d{rr)) - (l-l) 

< 9 + Alog{Tr{d{rr))) - I 

< 9 -h 41og(T.r(rr)) - I 




Before pairing After pairing, case 1 After pairing, case 2 



Fig. 2. Effect of a single pairing 



Second pairing pass: The second pairing pass can be viewed as a sequence 
of single transformations. As stated above the potential change of each applica- 
tion of a single transformations transformation is < log A — logB. Repeatedly 
applying this transformation going up the right path generates a telescoping sum. 
Since the dummy node of the recently extracted node still lies at the bottom of 




42 



J. lacono 



the right path, the sum is bounded by: 

l + 21og(T,(r,) + l) 

Removal of dummy node: Removing d{rr) cases a potential gain of 



-Pr{d{rr)) < 1 + 21og(l + Tr{d{rr)) 
= l + 21og(l + T,(r,)) 



The dummy node is only removed if the next operation is not an insertion. In 
any event the change in potential caused by the possible removal of the dummy 
node is 



< l + 21og(T,(r,) + l) 

Setting of e,-: Epsilon is assigned the value — 2pT-(r,-) only if the next oper- 
ation is an insert. As the previous value of e is zero, this causes a potential gain 
of at most 



-2f{Tr{rr)) < 2 + 41og(T(r,) + 1)) 

Changing of T{x) and n For all nodes x (except Vr, which has been re- 
moved), Tr+i{x) = Tr{x). Thus no potential change due to the changing of the 
T values occurs. The removal of a nodes causes a potential gain of 6. 

Summary The amortized cost of a remove min is the actual cost plus the 
change in potential. If we charge ^ unit of time for each pairing used, the actual 
cost is a < I + ^. Note that by doubling the potential function it is possible 
to charge the more pleasing one unit of time per operation however, this was 
not done to simplify the presentation. Thus combining the actual cost with the 
changes in potential for the removal of the root, the first and second pairing 
passes, the removal of the dummy node and the possible setting of e yields: 



0 = 0 - 1 - d>T+i — <Pt 

< I 12 log(T'(r.r) + 1 ) + 12 — I 

25 

< 12 log(r,-(rT) + 1) + "^ 

< 0(log T,(r,)) 

Proof (Insert Lemma): Inserting a new node x into a heap with root 
Tt-i will have two possible outcomes, depending on whether x is the smallest 
element of the new heap or not. This is depicted in Figure 3. Recall that if the 
previous operation was an extract-max, then the dummy of the previous root 
is still present, and e is nonzero. This possibility splits each of the two cases. 
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We may assume that the possible lowering of the values of rr{x) caused by 
the possible increase of Tt{x) for some nodes x, has been performed. Note that 
the potential loss lemma limits the total potential decrease over a sequence of 
operations, and thus needs not be considered here. We also note the loss of 6 
here caused by increasing n by one. 




Before Insertion After insertion, case 1 After insertion, case 2 



Fig. 3. Effect of insertion of a new element a: in a pairing heap. D represents the 
dummy node that is present if the previous operation was an extract-min 



Previous operation was an insert-. 



Pr{x) +Pr{r'^) -Prirr) < -Pr(?'r) 



Since the last operation was an insert there is one element (either r or an 
element of A) that has T{x) = 2. Thus 

-PriXr) < - logf{Tr{2)) < log 18 

Previous operation was an extract-min: Removing the dummy causes 
a potential gain of — pT(d(r,-_i)). Decreasing to zero causes a potential gain 
of 2pT_i{d{rT-i)). It is also the case that Prir) > Pr{d{rT_i). Thus the total 
potential gain is: 

Pr{r') +Pr{x) -Pr{r) ~ Pr{D) - Cr 

= -Pr{d{rr-l) + -Pr{d{rr-l) + +2pr-l{d{rr-l)) 

< 0 

Summary: Since the actual cost of the insert operation is a = 1, and the 
gain is potential is at most log 18 the amortized cost is: 

a = 1 + log 18 - 6 < 0 



4.1 The Working Set Theorem for Top Down Skew Heaps 

The working set theorem presented above can easily be adapted for seq-pairing 
heaps. Skew pairing heaps were introduced in [Sj, and implement all operations 
except extract-min identically to standard twopass pairing heaps. In the skew 



44 



J. lacono 



pairing heap, extract-min pairs every other tree, incrementally from right to left. 
The remaining trees are also paired incrementally from right to left. Finally, the 
two resultant trees are paired together. In p| it is shown that given any sequence 
of N operations (excluding ( decrease-key) and ( delete) that takes time T, the 
same sequence can be executed on a top down skew heap in time T' where 
T < T' < riN^ogn]\f. Since < riNlogriN < r/NlogriN, the same asymptotic 
bound of O (riNlogrij^ + ^olds for executing the sequence on a 

top down skew heap. 

4.2 Populate Replace Heaps 

Define a populate-replace-heap to be an abstract data structure that supports 
the following two operations: 

Populate: Given n items, populate inserts them into the heap. 

Replace-Min: Replace the minimum element with another. 

Theorem: Define s(a;) to be the number of items smaller than x when x is 
inserted into a heap. In populate replace heap, implemented as a pairing heap, 
threepass pairing, skew pairing or top down skew heap, one populate operation 
with n items x\ . ■ .Xn, followed by N replace-min operations, where yi ■ ■ ■ un 
are the replacement items takes time 0(n log n -|- Xti logs(j/i)) 

Proof: Two observations: First, ut < n throughout the life of the heap. 
Secondly, if when item x is inserted into a heap of n items there were k items 
smaller then x then there will never be more than n items inserted after x in the 
heap concurrently with x, if the heap size never exceeds n. This is because the 
n—k items larger then x upon insertion will still be present at a:’s removal. Thus 
when X is removed T(x) < k. These two observations, along with the working set 
theorem for pairing heaps complete the proof. Note that constant time insertion 
in threepass pairing heaps yields a charge of n rather than n log n in the above 
analysis. 

4.3 {Pairing, Skew-Pairing, Top-Down Skew} Heaps Merge Sorted 
Lists Optimally within a Constant Factor 

A populate replace heap, implemented as a pairing, skew-pairing, or top-down 
skew heap can be used to merge the items of m ordered lists of lengths ni, ri 2 , . . . 
nm by storing one element of each list in a the heap, and then repeatedly remov- 
ing the smallest item and replacing it with the next item in its list. Define kij 
to be the number of items in the heap when }’th item from list Xi is removed. 
Define N = XXo Note that X^=i 

Theorem: Pairing heaps merge m ordered lists size rii,n 2 ■ ■ ■ rim with a total 

of N elements optimally in time 0 (jnlogm -|- -|- XXi (^)) 

Lower bound 

The Information theory bound 

Given a set of m ordered lists |a:i,a ;2 . . -Xn}, where list i has nt elements, 
we wish to generate one ordered list X with N = XXi elements. Since 
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the number of possible orderings of X is given by the multinomial coefficient 
N \ . 

' , the information theoretic bound on this problem is 



ni,ri2, 



log 



N 



nr , . . . , y ^ ^ (^) ) 

Other lower bounds 

We assume a lower bound of f2{N), since each item must be looked it. 

Since merging this lists sorts the m heads of each list there is an lower bound 
of f2(mlogm) 

Summary 

Linearly combining the three lower bounds above with suitable constants 
yields a lower bound of Q {m log m + N + log (^) ) 

Upper bound 

The total cost of merging the lists according to the populate replace theorem 
is mlogm + X;™iE”lilogA:*j = O (^mlog m + iV + X]™ i log 
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Abstract. We show how to maintain centers and medians for a collec- 
tion of dynamic trees where edges may be inserted and deleted and node 
and edge weights may be changed. All updates are supported in O(logn) 
time, where n is the size of the tree(s) involved in the update. 



1 Introduction 

In this paper we study the problem of locating facilities in a collection of dynamic 
trees. For each tree, we wish to maintain (1) a center, which is a node minimizing 
the maximal distance to all other nodes, and (2) a median, minimizing the sum 
of the distances to all other nodes. In both cases, we have edge weights, and 
in the later case, it is relevant to have node weights, and then the cost of the 
median is the weighted sum of distances to all other nodes. 

In 1971 Goldman j I Yj gave a linear time algorithm for determining the median 
in a tree. In 1973 Handler m showed how one in linear time can compute the 
center of a tree. The static median and center problems have been investigated 
and generalized in many papers, see e.g. !i8fciiaii| . A long list of references to 
the median and center problem and similar problems can be found in m- 

In our dynamic setting, we allow weights to be changed, and further edges 
may be inserted and deleted. In the rest of this paper, n denotes the size of the 
tree(s) involved in an operation. Our main result is that both centers and me- 
dians can be maintained in O(logn) time per update. For centers, the previous 
bound was 0(log^ n) [^. For medians, poly logarithmic bounds were only known 
for changes of node weights |3|, but not for edge insertions and deletions. More 
precisely, |3] presents an O(logn) bound for the monotone case where weights 
may only be increased. If the weights may be both increased and decreased, 
they claim an 0(log^ n) bound. However, to achieve these results, they claim 
they can access subtree weights in constant time, spending O(logn) per weight 
update |3l p. 445]. This contradicts a cell-probe lower bound (for word size logn) 
saying that an update time of implies a query time of Qilogn/ log(t„ log^ n)), 
even if the tree is just a path (prefix-sum) ^31 P- 348 (2)]. Our O(logn) solu- 
tion to the dynamic median problem does not maintain all subtree weights. All 
our algorithms are elementary in that they can be implemented on a pointer 
machine m- 
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A common problem in finding medians and centers are that they are “non- 
local” properties. Here, by a local property we mean that if a node has the 
property in a tree, then it has the property in all subtrees it appears in. Local 
properties lend themselves nicely to bottom-up computations, whereas non-local 
properties tend to be more challenging. 

A main advantage to our solutions of the dynamic center and median prob- 
lems are that they are simple, based on the top trees from 0. Towards the end 
of the paper, we will argue that it would have been more technical to solve the 
problem with the classical dynamic trees from m Thus, our methodological 
contribution is to pin-point advantages of designing dynamic tree algorithms 
with top trees. 

2 Top Trees 

In this preliminary section, we discuss our basic starting point: top trees from j2j. 
Our presentation of the interface will be somewhat more precise and thorough 
than that in The more exact understanding of the interface is needed for 
both our applications, and for our later methodological discussion of top trees 
versus more classical data structures for dynamic trees 

A top tree is defined based on a pair consisting of a tree T and a set dT 
of at most 2 nodes from T, called external boundary nodes. Given (T,dT), any 
connected subtree C of T has a set 9(t,3T)C' of boundary nodes which are the 
nodes of C that are either in dT or incident to an edge in T leaving C. The 
subtree C is called a cluster of (T, dT) if it has at most two boundary nodes. 
Then T is itself a cluster with 9(t,ot)T’ = dT. Also, if A is a subtree of C, 
d{c,d(T dT)C)-^ = so a is a cluster of (C, d(r,aT)C) if and only if A is a 

cluster of (T,dT). Since 9(t,3T) is a canonical generalization of d from T to all 
subtrees of T, we will use 9 as a shorthand for dfj',dT) in the rest of the paper. 
A top tree T over (T, dT) is a binary tree such that: 

1 . The nodes of T are clusters of (T, dT) . 

2. The leaves of T are the edges of T. 

3. If C is an internal node of T with children A and B, then C = AU B and A 
and B are neighbors, that is they share a single node (see Figure [IJ. 

4. The root of T is T itself. 

A tree with a single node has an empty top tree. 

The top trees over the trees in our forest are maintained under the following 
operations: 

Link(i! ,w): where v and w are in different trees, links these trees by adding the edge 
{v, w) to our dynamic forest. 

Cut (e): removes the edge e from our dynamic forest. 

Expose(w,w): where v and w are in the same tree T, makes v and w the external 
boundary nodes of T. Moreover, Expose returns the new root cluster of the top 
tree over T. 

Expose can also be called with one or zero nodes as arguments if we want less 
than two external boundary nodes. 
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Fig. 1. Combining two clnsters in one. The boundary nodes and cluster paths in the 
figure are for the resulting cluster. 



In general, Link and Cut makes the set of external boundary nodes for the 
resulting trees empty. Every update of the top tree can be implemented as a 
sequence of the following two operations: 

Merge(A, B)i where A and B are the root-clusters of two top trees Ta and 7b. 

Creates a new cluster C = Au B and makes it the common root of A and B, thus 
turning Ta and 7s into a single new top tree Tc- Finally, the new root cluster C 
is returned. 

Split(C'): where C is the root-cluster of a top tree Tc and has children A and B. 
Deletes C, thus turning Tc into the two top trees Ta and 7s. 

Recall that n denotes the size of the trees involved in a given update operation. 
From pnn we have: 

Theorem 1 For a dynamic forest we can maintain top trees of height O(logn) 
supporting each Link, Cut, or Expose with a sequence of O {log n) Merge and 
Split. Here the sequence itself is identified in O(logn) time. The space usage of 
the top trees is linear in the size of the dynamic forest. 

Note that since the height is maintained logarithmic, any edge is contained 
in at most O(logn) clusters. In contrast, a node v can appear in 0(n) clusters. 
However, v can only be anon-boundary node in O(logn) clusters. More precisely, 
if V is not an external boundary node, there is a unique cluster C :=Merge(H, B) 
where {u} = An B and v ^ dC. Then u is a non-boundary node in a cluster D 
if and only li D = C or D is one of the O(logn) ancestors to C. 

Notation If v and w are connected in a tree, v ■ ■ ■ w denotes the unique path 
from u to ic. If a cluster C has two boundary nodes a and b, we call a - ■ - b the 
cluster path of C, denoted tt{C). If |5(0)| < 2, 7r(C) = 0. Note that if H is a 
child cluster of C and A shares an edge with 7 t(C), then 7 t(H) C and then 

we call A a path child of C. In terms of boundary nodes, if C has children A and 
B, A is a, path child of A if and only if \dC\ = 2 and either dA = dC (Fig. 01(2)) 
or dC C dA U dB (Fig. E(l))- 

Representation and usage of top trees A top tree is represented as a standard 
binary rooted tree with parent and children pointers. The “top” nodes of the 
binary tree represent the clusters, and with each top node is associated the set of 
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at most two boundary nodes of the represented cluster. The leaves of the binary 
top tree are still identified with the edges of our tree. Finally, from each node v, 
there is a pointer C{v) to the smallest cluster that u is a non-boundary node in, 
or to the root cluster containing u if u is an external boundary node. 

The user of the top tree data structure has direct access to the above rep- 
resentation, and will typically associate some extra information with the top 
nodes. The user is guaranteed that the top tree is only modified with Merge and 
Split. In connection with each Merge and Split the user is notified and given 
pointers to the top nodes representing the involved clusters. The user can then 
update his information associated with these top nodes. 

For example, suppose, as in m, that we want to maintain the minimum 
weight on the path between any two vertices. Then with each (top node repre- 
senting a) cluster C, we store as extra information the minimum weight Wc on 
the cluster path Tr(C'). For an edge, this is just the edge weight. When C is cre- 
ated by a merge, we store the minimum weight stored at its path children. When 
C is split, we just discard the information stored with C . Now, to find the min- 
imum weight between v and w, we set C :=Expose(u, w). Then Tr(C') = v ■ ■ ■ w, 
and we return Wc- 

Together with Theorem ^ the above description of how to modify and use 
our extra information Wc allows us to conclude that we can maintain a dynamic 
collection of trees with Cut, Link and queries to minimum weights between given 
nodes in O(logn) time per operation. 

We shall refer to the algorithm from Theorem ^ that translates Cut, Link, 
and Expose into sequences of Merge and Split as a driver. In our description of 
our extra information, we did not need to worry about how the driver scheduled 
the Merge and Split; we just had to tell how information had to be modified in 
connection with an arbitrary Merge and Split. 

Above, Split was trivial. To see its relevance, suppose as in that we as an 
additional operation want to add a weight x to all edges on a path v ■ ■ ■ w. Then, 
for each cluster C, we introduce a “lazy” weight Ac which is to be added to all 
edges in Tr(C') in all clusters strictly descending from C. The addition of x to 
V ■ ■ - w is now done by calling C :=Expose(u, w) and adding x to Wc and to Ac- 
Then Split(C') requires that for each path child A of C, we set Wa '-= Wa + Ac 
and Aa '-= Aa + Ac- For C :=Merge(A, B), we set Wc := min{kFA,VFs} 
and Ac := 0. Finally, to find the minimum weight on the path v ■ ■ ■ w, we set 
C :=Expose(w, ui) and return Wc- 

Put in perspective, our top trees are natural generalizations of standard bal- 
anced binary trees over dynamic collections of lists that may be concatenated 
and split. In the balanced binary trees, each node represent a segment of a list, 
which in top terminology is just a special case of a cluster. Standard drivers for 
balanced binary trees also ascertain that the height is O(logn), and that each 
concatenation can be done by 0(log n) local modification, called rotations. 
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3 Non-local Searching 

We are now going to build a black-box on top of our top trees for maintainance 
of centers and medians. As discussed in the introduction, the common fea- 
ture of centers and medians is that they represent non-local properties. Here 
a node/edge property is local if it being satisfied by a node in a tree implies 
that the node satisfies the property in all subtrees containing it. For example, 
being the minimum edge on a given path is a local property. Local properties 
lend themselves nicely to bottom-up computations whereas non-local properties 
appear to be more challenging. 

For our general non-local searching, the user should supply a function Select 
that given the root cluster of a topology tree, selects one of the two children. 
Recall here that a root cluster represents the whole underlying tree, which is 
important when dealing with non-local properties. Our black box will use Select 
to guide a binary search after a desired edge. 

Theorem 2 (Non-Local Search) Given a top tree, after O {log n) calls to Se- 
lect, Merge, and Split, there is a unique edge {v, w) contained in all clusters 
chosen by Select, and then {v,w) is returned. 

As stipulated in the general interface to top trees, the driver behind Theorem El 
will only manipulate the top tree with merge and split operations. 

Before proving Theorem 0 we apply it to the center and median problems. 
Our general approach is to first decide the information needed for Select, second 
show how to make the information available. 

3.1 Dynamic Center 

For any tree T and node v let /i„ (T) denote the length of the longest path from 
u in T. A center is a node v minimizing hy{T). 

Lemma 3 Let T be a tree, and let A and B be neighboring clusters with AC\B = 
{c} and A\J B — T . If hc{A) > hc{B), A contains all centers. 

Proof: Let w be a node in A of maximal distance to c. Then dist{c, w) = 
hy{A) = hJfT). Now, if ?; S R \ A, hy{T) > dist{v, w) = dist{v, c) dist{c, w) = 
dist{v, c) -I- hc{T). Since the edge weights are positive, dist{v, c) > 0, so cannot 
be a center minimizing hy{T). □ 

For every cluster C , dC = {a, b}, we maintain: 

— The distance between the boundary nodes: dist{C) 

— The maximal distance in C from each boundary node: ha{C),h},{C) 

Thus, for a new edge {v, w) with weight x, we just set 

dist{{v,w)),hy{{v,w)),hy,{{v,w)) := x. Consider merging A and B in C. To get 
dist{C) we just sum dist{D) for each path child D G {A,B} of C (In Fig.0 we 
have two path children in (1), one in (2), and none in (3,4)). For each a G dC, 
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we compute ha{C) as a maximum over the two cluster children A and B of C, 
depending on which the node furthest from a is found in. If a C dA, the maximal 
distance to a node in A is ha{A). If a ^ dA and {c} = An B, we have to pass B 
to get to c, so the maximal distance to a node in A is dist{B) + hc{A). Splitting 
a node does not require moving any information, so we conclude that we can 
maintain the above information for the clusters in constant time per merge or 
split, hence in O(logn) time per link or cut by Theorem ^ 

We will now define Select given a root cluster C with children A and B, 
AnB = {c}. If hc{A) > hc{B), Select picks A, otherwise it picks B. By Lemma 
0 any cluster picked contains all centers, so following Theorem 0 the returned 
edge (v,w) contains all centers. Since Select takes constant time, (v,w) is found 
in O(logn) time. 

To find out if whether u or w is a center, we compute C :=Expose(w, rc) in 
0(log n) time, using Theorem Q Since C coincides with T, we can return v if 
hy{C) < /iiu(C); w otherwise. 

Theorem 4 The center can be maintained dynamically under link, cut and 
change of edge weights in O(logn) worst case time per operation. 

Proof: Since the above Merge, Split, and Select are supported in constant time, 
the time bound follows from Theorem El □ 

3.2 Dynamic Median 

Let r be a tree with both positive node and edge weights. A median is a node 
m minimizing weight{v) * dist{v,m), where dist{v,m) is the sum of cost 

of edges on the unique path from u to m in the tree. For any tree T, let w{T) 
denote the sum of node weights of T. Our approach to finding medians is similar 
to that for centers, but for the median, it is natural to allow the user to change 
node weights, and this requires a simple trick. 

The lemma below is implicit in Goldman ng. 

Lemma 5 Let {v,w) be an edge in the weighted tree T. Let T„ and T„, be the 
trees from T\ {(u,r(;)} containing v and w, respectively. Lf w(Ty) = w(Ty,), v 
and w are the only medians in T, and if w(Ty) > w(Ty,), all medians in T are 

in Ty. 

Corollary 6 Let T be a tree, let m be a median of T and let A and B be 
neighboring clusters with An B = {c} and AU B = T. Then w{A) > w{B) 
m £ A. 

Proof: Let (c, w) be any edge in B leaving c. Then Ty = A and w{Tf) = w{A) > 
w{B) > w(Ty,), so by LemmaElthere are no medians in Ty,. It follows that there 
cannot be any medians in B \ {c}. □ 

The above corollary suggest that we should maintain the weight of each cluster, 
but this gives rise to a problem; namely that a single node can be contained 
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in arbitrarily many clusters, and a change in the node’s weight would affect all 
these clusters. 

Our solution is that for each cluster C, we only maintain their “internal 
weight” w^{C) = w{C \ dC). This means that when we merge two clusters A 
and B, AC\ B = {c} into C, we add their internal weights plus the weight of c 
if c ^ dC . The point is that any node is only non-boundary in O(logn) clusters, 
and the internal weight of these clusters is trivially updated in O(logn) time if 
a node weight changes. 

Since clusters have at most two boundary nodes, for a given cluster, we can 
always compute the real weight from the internal weights in constant time, and 
hence we can implement Select choosing the lightest cluster in constant time, 
getting an edge (v,w) which contains the median in O(logn) time. 

To find out which of v and w is the median, we apply Lemma El We now cut 
the edge (v,w), and return v if the (root cluster of the) tree containing v is 
heavier; w otherwise. Before returning v and w, we link (v,w) back in T. The 
link and cut take O(logn) time, so we conclude: 

Theorem 7 The median can he maintained dynamically under link, cut and 
change of edge/node weights in O(logn) worst case time per operation. 



3.3 Non-local Search Implementation 

We will now prove Theorem El Essentially our search will follow a path down 
the given top tree T. As we search down, we will modify the top tree so as to 
facilitate calls to select, but we will end up restoring it in its original form. Thus, 
when we start the search, we assume that some driver, as in Theorem ^ provides 
a top tree of height 0(log n). It is convenient to assume that there is at least one 
external boundary node so that all clusters have at least one boundary node. 
During the search, we manipulate the top tree, but we end up returning it to the 
driver in exactly the same shape as we got it. All modifications for the search 
are done via Split and Merge, as stipulated in the general interface to top trees. 

Our search consists of O(logn) iterations i = 0, .... At the beginning of it- 
eration i, there will be a cluster Ci on depth i in the original top tree which 
contains exactly the edges that have been in all clusters selected so far. If Ci is a 
single edge {v,w), we return (y,w). Otherwise Ci has children Ai and Bi in the 
original tree. Select will then be presented a root cluster with children A* and 
B* such that Ai C A* and Bi <Z B*. If the user selects A*, we have Ci+\ = Ai 
for the next iteration. Otherwise = Bi. 

To simplify the description of the generation of A* and B*, define the top 
tree of a singleton node c to be the cluster consisting of that node. Moreover, if 
c is a boundary node of a cluster C, define a merge with c to be neutral for C, 
that is, C remains a root cluster which is returned by the merge. 

Now, let dCi = {a,b}, a G Ai, and b G Bi. Here, possibly, a = 6. At the 
beginning of iteration i, we have three root clusters, Ci, Ai 3 a, and Bi 3 b, 
partitioning T in the sense that they contain all edges and only overlap in a and 
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b. In the first iteration, we have Cq = T, Aq = {o} and Bq = {6}. For any i, we 
call the user-defined Select with the root cluster obtained as 

Merge(Merge(ii, A^),MeTge{B^, Bi)) 

By symmetry, we may assume that the user selects Merge(Aj, Ai). We then split 
the newly created root cluster, as well as Merge ( and then we have the 
three root clusters = Ai, Ci+i = Ai, and Bi+\ =Mevge(Bi, Bi) ready for 
iteration i -|- 1. 

As mentioned, the iterations stop as soon as we arrive at a Ci which is just a 
single edge (v,w). Since the height of the top tree before the search is O(logn) 
and since each iteration only involves a constant number of Merge and Split, we 
conclude that the total number of Merge and Split is O(logn). At the end when 
we have found Ci = (v, w), we just reverse all Merge and Split to restore the top 
tree in its original form, and return the edge (y,w). 



4 Methodological Remarks 

Our main results in Theorem 0 and Q could also have been achieved based on 
either the Sleator and Tarjan’s dynamic trees 1221, or Frederickson’s topology 
trees [TTWTTj . However, we claim that the derivation from these more classical 
data structures would have been more technical. 



Sleator and Tarjan’s dynamic trees Sleator and Tarjan provide an axiomatic 
interface for their dynamic trees 1221 where the user can choose a root with a 
so-called Evert-operation, and then, for any specific node, add weights to all 
edges on the path to the root, or ask for the minimum of all weights on this 
path. This is basically the interface we implemented with top trees at the end 
of Section Q assuming that we expose both the desired root and the specified 
node. 

Before discussing limitations to the above interface, we first illustrate its gen- 
erality viewing the min-query as representing an arbitrary associative operator 
©. Suppose, for example, as in 1221 that we want to implement parent pointers 
to the current root. We then let the weight of an edge be its pair of end-points 
and define a © 6 = a. Then the “min” -query returns the end-points of the first 
edge on the path to the root, from which we immediately get a parent pointer. 

Unfortunately, the above axiomatic interface has been found too limited for 
many application of dynamic trees, and instead authors have worked directly 
with the Sleator and Tarjan’s underlying representation [31 MotZ I I24I231 1 4141 1 1 1 tij . 
TOE21- In particular, this is the case for the previous solutions to the dynamic 
center p] and median problems |3], and we believe part of the reason for their 
worse bounds and more complex solutions is difficulties in working directly with 
Sleator and Tarjan’s underlying representation. 

Of course, one may try to increase the applicability of the axiomatic interface 
by augmenting it with further operations. For example, I2H] shows how to find a 
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minimum weight node in a subtree. However, dealing with non-local properties 
is not so immediate, and we find it unlikely that we will ever converge to a set 
of operations so big that we can forget about the underlying representation. 

The approach with top trees has instead been concentrated on designing a 
representation which is very easy to deal with directly. For example, to compute 
the minimum node of a given subtree as in since we can insert and delete 
edges, this is equivalent to maintaining the minimum node of each tree in a 
dynamic forest, and this is again done by maintaining, for each cluster, the 
minimum weight over its non-boundary nodes. Since each node is only non- 
boundary in O(logn) clusters, weight changes of nodes are trivially supported. 
If we do not expose any external boundary nodes, the root cluster will store the 
desired minimum. 

Fredericks on’s topology trees Top trees are very similar to Frederickson’s topol- 
ogy trees HDE3, from which they were originally derived. The essential differ- 
ence is that the clusters of topology trees are not connected via nodes, but via 
edges. Since Frederickson’s boundary consists of edges, he cannot have bounded 
boundaries for unbounded degree trees. Thus, in applications for unbounded de- 
grees one has to code these with ternary trees; a quite standard process whose 
validity has to be verified for the individual application. Even if we assume we 
are dealing with ternary trees, topology trees still have clusters with up to three 
boundary edges instead of just two boundary nodes. Also topology merge com- 
bines two clusters plus the edge between them whereas a top merge just unites 
two neighboring clusters. Neither of these issues lead to fundamental difficulties, 
but in our experience, they lead to significantly more cases. 

Henzinger and King’s ET-trees For completeness, we also mention Henzinger 
and King’s ET-trees m- This is a standard binary trees over the Euler tour 
of a tree. This technique is much simpler to implement than those mentioned 
above, and it can be used whenever we are interested in maintaining a min 
over the edges or nodes of a tree, where the min may be interpreted as any 
associative and commutative operation. Thus, the above mentioned result from 
El on maintaining the minimum weight node of a tree is immediate, and in 
fact, this was pointed out before El in m- However, the ET-trees cannot be 
used to maintain any of the path information discussed so far. Also, they cannot 
be used to maintain medians and centers. 

5 Concluding Remarks 

We have presented new and better bounds for maintaining medians and centers 
in dynamic trees. The results were obtained based on top trees, and we have 
argued more generally, that top trees in many instances would be the preferred 
data structure for solving problems on dynamic trees. 

A top driver as described in Theorem H can be implemented by reduction to 
either Sleator and Tarjan’s techniques for dynamic tress or Frederickson’s 
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techniques for topology trees m The later option was pointed out in [3|- We 

are currently experimenting with these different implementations to see which 

leads to the better performance. 
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Abstract. The dynamic maintenance of the convex hull of a set of 
points in the plane is one of the most important problems in compu- 
tational geometry. We present a data structure supporting point inser- 
tions in amortized 0(logn • log log logn) time, point deletions in amor- 
tized 0(logn • log logn) time, and various queries about the convex hull 
in optimal O(logn) worst-case time. The data structure requires 0(n) 
space. Applications of the new dynamic convex hull data structure are 
improved deterministic algorithms for the fc-level problem and the red- 
blue segment intersection problem where all red and all blue segments 
are connected. 



1 Introduction 

The problem of maintaining the convex hull of a set of points in the plane under 
the insertion and deletion of points is one of the foremost important problems 
in computational geometry IHIini . A dynamic data structure for maintaining the 
convex hull of a point set has numerous applications, e.g. in algorithms solving 
the fc-level problem 0 and the red-blue segment intersection problem where all 
red and all blue segments are connected For further applications see 0. 

Overmars and van Leeuwen in 1981 gave a solution for the fully dynamic 
convex hull problem supporting point insertions and deletions in 0(log^ n) time, 
where n is the maximum number of points in the set m- The data structure 
of Overmars and van Leeuwen stores the convex hull in a search tree and typ- 
ical queries on the convex hull are supported in O(logn) time. Preparata and 
Vitter gave a simpler approach achieving the same bounds as Overmars and 
van Leeuwen in HH. Until recently there was made no progress on improv- 
ing the update bounds for the general case. First in 1999, Chan presented a 
data structure that achieves amortized 0(log^^® n) update time, where e > 0 
is any arbitrary constant, and 0(log n) query time for various types of queries, 
e.g. membership and tangent-finding 

For special cases better update bounds are known. For the semi-dynamic 
case where only insertions are allowed, it is easy to achieve O(logn) insertion 
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time m- For the other semi-dynamic case where only deletions are allowed after 
preprocessing, Hershberger and Suri achieved 0(n log n) preprocessing time and 
amortized O(logn) deletion time 0. For the off-line case where the sequence of 
updates is given in advance, a data structure using 0(n log n) time for processing 
a sequence of n updates was given in m- The case where the sequence of updates 
is random was considered in |1 111 5| , where it was shown how to achieve expected 
O(logn) update time. 

In this paper, we first give a new data structure for the semi-dynamic problem 
where only deletions are allowed after preprocessing, by extending the construc- 
tion of Hershberger and Suri |0j. Provided that the initial point set is given 
lexicographically sorted, we achieve amortized 0{n) preprocessing time, and 
amortized 0(logn ■ log log n) deletion time. The data structure requires 0{n) 
space. Our main result for the fully dynamic case is a transformation strat- 
egy that combines a fully dynamic data structure with a semi-dynamic data 
structure for the deletions only case, and generates a new fully dynamic data 
structure. The construction is based on the construction of Chan ^ combined 
with several new ideas. Let U{n) and D{n) be two nondecreasing positive func- 
tions, where U{n) > logn and D{n) > logn. If there exists a fully dynamic 
data structure with amortized 0{U{n)) update time and worst-case O(logn) 
query time, and a semi-dynamic data structure with 0(n) preprocessing time 
and amortized 0{D{n)) deletion time, then the transformation yields a data 
structure with amortized 0(U {log'^ n) • log n/ log log n) insertion time, amortized 
0{D(n)) deletion time, and worst-case 0(log n) query time. The queries that can 
be supported are: find the extreme point on the convex hull in a given direction; 
report whether a given line intersects the convex hull; report if a given point is 
contained in the interior of the convex hull; find the two points adjacent to a 
point on the convex hull; and given an exterior point find the two tangent points 
on the convex hull from the point. 

Combining our semi-dynamic data structure with the fully dynamic data 
structure of Overmars and van Leeuwen ca, we immediately get amortized 
O(logn-loglogn) deletion and insertion time. By bootstrapping, we can use the 
resulting data structure as the fully dynamic data structure in the construction 
and the insertion time reduces to amortized 0(logn ■ log log logn) time, while 
the deletion time remains amortized 0(logn • log logn). 

We note that a semi-dynamic data structure with 0(n) preprocessing time 
and O(logn) deletion time, would for any constant k imply a fully dynamic 
data structure with amortized 0(logn • log^^^ n) insertion time and amortized 
O(logn) deletion and worst-case O(logn) query time, by A: — 1 applications 
of our transformation strategy and using the data structure of Overmars and 
van Leeuwen as the initial fully dynamic data structure 0 

The paper is organized as follows. Section ^contains a description of the semi- 
dynamic data structure for the deletions only case, and Sect. 0 and 01 contain 
the results for the fully dynamic case. Section 0 gives applications of the fully 
dynamic data structure. 



^ We let n = logn, and log*-*"'"^^ n = loglog*-*^ n for i > 1. 
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Fig. 1. The convex hull CH(P) of a set of 
points P can be partitioned into an up- 
per hull UH(P), a lower hull LH(P), and 
possibly two vertical lines. 



PR 




Fig. 2. Deletion of the point p from the 
upper hull implies that p is replaced by 
the sequence of points pi,p 2 ,P 3 - 



Notation 

Given a set of points P in the Euclidean plane, we let CH(P) C P denote the 
set of points on the convex hull of P, and UH(P) and LH(P) denote respectively 
the upper and lower hull of CH(P). Figure C] shows the upper and lower hulls 
of a set of points. In the following we restrict our attention to the upper hulls 
of the sets of points, and assume for the sake of simplicity that points are in 
general position, i.e. all points have distinct x-coordinates and no three points 
are on a line. The results for the convex hull problems immediately follow from 
the results on the upper hulls. 

2 Semi-dynamic Data Structure 

In this section we give a data structure for the semi-dynamic problem with amor- 
tized 0{n) preprocessing time, and which supports point deletions in amortized 
0(logn ■ log log n) time. To achieve linear preprocessing time we require points 
to be given lexicographically sorted. The data structure supports the operations: 

Build Given a lexicographically sorted set P containing n points, builds a data 
structure for P and returns the points on UH(P) from left-to-right. 

Delete Deletes a point p from P, and returns the changes to UH(P), i.e. if p 
was contained in UH(P) before the deletion then the sequence of new points 
on UH(P) are returned from left-to-right (see Fig. Ej). 

Our result for the semi-dynamic problem is the following. 

Theorem 1. There exists a data structure supporting Build in amortized 0(n) 
time and Delete in amortized O(logn-loglogn) time. The data structure requires 
0{n) space. 

In the following we without loss of generality assume n > 4, such that 
log log n > 1. Let P = {pi,p 2 , . . . ,Pn} be the initial set of points, where pi < Pi+i 
for 1 < i < n, and let B = [logn] and N = \n/B']. We partition P into 
a sequence of blocks Pi,..., Pat, each of size B except for Pjv, where Pi = 
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{pi+(i_i)s,P 2 +(i-iis, • ■ ■ ,Pmin(iB.n)}, for 1 < I < TV. After a sequence of Delete 
operations we let P C P denote the set of points which have not been deleted 
so far, and similarly we for Pi , . . . , define Pi , ■ ■ - ,Pn- 

For each block Pi, the points Pi are stored in sorted order in a linked list, 
UH(Pi) is stored as a perfect balanced binary tree, and furthermore the points 
from left-to-right on UH(Pi) are kept in a doubly linked list. 

Since \Pi\ < B, the upper hull UH(Pi) can be constructed by a linear sweep 
of UH(Pi) in 0(B) time, see e.g. |21 Sect. 1.1]. The balanced tree and the double 
linked list storing UH(P;) can therefore be recomputed in 0{B) time, when a 
point is deleted from block Pi. 

The blocks Pi, , Pn are stored from left-to-right at the leaves of a perfect 
balanced binary tree T with height [log TV] . For each node v in T, we let 
denote the subtree of T rooted at v, and let P„ denote the union of the sets Pi 
stored at the leaves of T„. It is easy to see that UH(P„) n UH(Pi) is either 
empty or a consecutive subsequence of UH(Pi). At each node u of T we store 
UH(j\,) as a doubly linked list of block-records, such that for each block Pi 
contributing to UH(P„), i.e. UH(Pi,) fl UH(Pi) yf 0, we have a block-record 
For each block-record we store pointers to the leftmost and rightmost points 
in UH(Pi) which are also in UH(P„). For a block Pi, let Vq,Vi, . . . ,Vk be the 
prefix of the nodes in T on the path from the leaf vq storing Pi to the root, 
where UH(Pi) nUH(P„^) yf 0, i.e. r^^^i G Ly.. For 0 < j < fc, we with Vy.^i store 
an up-pointer to ry.^.,^^i. This representation allows us to efficiently navigate 
UH(P„) in both directions from point-to-point and block-to-block in constant 
time. Note that UH(P) is stored at the root of T. 

Since each block requires 0{B) space the total space for the N blocks is 
0{N ■ B). Since P is partitioned into N blocks, the total space for the lists of 
block-records at each level of T is at most 0{N). The total space required is 
0{N ■ B + N -logN) = 0{n). 

We now turn to the implementation of the operations. For Build the input 
set P is first partitioned into N blocks, and for each block the upper hull is 
computed by a sweep line algorithm in 0(B) time and each block structure is 
initialized in 0{B) time. The construction time for all blocks is 0{n -|- TV • P) = 
0(n). The tree T is then processed bottom-up level by level. Assume a node v 
has two children wi and W 2 , and Ly,., and Ly,,, have already been computed (for a 
leaf £, we define Li to only contain one block-record with pointers to the first and 
last node of UH(Pf)). First we let Ly be the concatenation of Ly,, and Ly,,. The 
resulting list of block-records represents a sequence of points forming a convex 
curve except for possible at one point, namely the last point from CH(Pu,J or 
the first point from CH(Pu, 2 ), i.e. there is a pointer to p in one of the block 
records in Ly. 

To fix this problem we apply the standard method used in convex hull con- 
struction algorithms: while we have a non-convex point p in the list of points, 
i.e. p together with its predecessor and successor point in the list form a left- 
turn, we remove p from the list. Removing p is done as follows: if p is in block Pi, 
and p is the only point from UH(Pi) in the list, i.e. both pointers in Vy^t point 
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to p, we remove from Otherwise we replace the pointer to p in r-u^i by 
a pointer to the next point in UH(_Pi) in the direction of the point given by 
the other pointer in ry^i, where we utilize that the points in UH(Pi) are kept 
in a double linked list. We can at most remove a point once in the bottom-up 
preprocessing of T, and the time for preprocessing one level of T is 0{n) plus 
the time used to eliminate left turns. The total time for constructing all Ly lists 
becomes 0(n + N ■ logN) = 0{n). It follows that Build takes 0{n) time. 

Before describing the Delete operation, we observe that only upper hulls 
actually containing p need to be updated (see Fig. 0. To perform Delete first in 
0(log n) time make a binary search locating the block Pi containing p, assuming 
that P was given as an array of points or that we keep P in a balanced search 
tree. In 0{B) time we check if p S UH(Pi). If p ^ UH(Pi) then no upper hull 
needs to be updated and it is sufficient to remcwe p from the list of points in Pi 
in 0{B) time. Otherwise p G UH(Pi), and let P and P be the predecessor and 
successor of p in UH(Pi) (if present), and rebuild in 0{B) time the data structure 
for block Pi after p has been deleted from the list of points in Pi . What remains 
is to update all the upper hulls which contained p. If p G UH(P„) for a node v 
then Vy^i G Ly. But then is reachable from Pi using the stored up-pointers. 

The reconstruction of upper hulls is done bottom-up in T. Consider a node v 
and the effect of deleting p from UH(P„). Let pl and pa be the two points in Pi 
that Ty i has pointers to, where pl < Pr- P < Pl or p > pr then p ^ UH(.P„) 
and we are done. If pl < p < pr then the changes to UH(.Pi,) can only be 
between pL and pR, i.e. the updates are done locally in block Pi and no changes 
are required for Ly. The complicated case is when p = pl or p = pR. First we 
need to delete p from the upper hull stored at v. If pl = pr then p was the only 
point from block Pi, and we delete r„ ^ from Ly. Otherwise we Imve two cases: if 
P = Pl then we replace the pointer to p in Vy^i by a poi^er to P, and if p = pr 
then we replace the pointer to p in by a pointer to P . 

After having deleted p from UH(.P„), we must insert new points onto UH(P„), 
as illustrated by Fig. 0 If p was not an endpoint of the bridge connecting two 
points on the two upper hulls stored at the children of v (see Fig.E|), then the 
changes to UH(P^) are exactly the changes to UH(i\„), where w is the child of v 
where p G UH(.P„,) before the deletion. It follows that it is sufficient to create 
and update existing block-records in Ly with exactly the same pointers to points 
in blocks as done for Ly,. 

The final case is when p is an endpoint of the bridge connecting the upper 
hulls stored at the children of v, ad illustrated in Fig.0 Assuming the new bridge 
has been found, then updating Ly with respect to the new points on UH(P„) 
consists of inserting a subsequence of the points from each of the upper hulls 
stored at the children of u, by creating a sequence of new block-records in 
with the same information as stored at the two children of v and changing at 
most four pointers in the block-records in Ly corresponding to the ends of the 
subsequences copied. 

To find the new bridge we apply a standard bridge searching algorithm, 
with minor modifications. The standard bridge searching procedure keeps for 
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Fig. 3. The bridge between two horizontally separated upper hulls. The dashed lines 
show the changes to the left upper hull and the new bridge when deleting point p. 



the upper hulls two candidate intervals for each of endpoints of the bridge, and 
performs a “simulations binary search” on both hulls, always halving at least 
one of the intervals. See e.g. uni Lemma 3.1] for further details. We replace the 
binary search by a linear block search on each of the two upper hulls. The linear 
block search at the left child proceeds left-to-right, always trying to advance one 
block, whereas the linear block search at the right child proceeds right-to-left. 
Whenever a search is advanced to the next block a block-record is added to 
in 0(1) time. 

The search process for each upper hull first tries to advance a complete block 
at a time, using the information stored at the block-records at the children of v 
to always pick the last point in the next block Pi contributing to UH (/);). After 
having localized the block Pi containing one endpoint of the new bridge the 
search then proceeds in a binary fashion using the search tree storing UH(.Pi). 
The total time for finding a bridge becomes linear in the number of block-records 
created plus O(logiJ). The output of Delete can be generated immediately from 
the changes to Lroot(T)- 

The total time for a deletion becomes 0{B -I- a; -I- log • logiJ), where x is 
the total number of new block-records created. Since a deletion at most removes 
one block-record from each level of T, it follows that D deletions at most delete 
D ■ logN block-records. Since there can at most be 0{N ■ log A^) block-records, 
it follows that the total time for D deletions is at most 0{D • i? -|- A^ • log A^ -|- U • 
log N + D ■ log N ■ log B) = 0{n + D ■ log n • log log n) . Since the 0(n) term can 
be charged to Build, it follows that Build takes amortized 0{n) time and each 
Delete operation amortized O(lognloglogn) time. 

3 Fully Dynamic Data Structure 

For this part of the paper we change the point of view of the exposition to the 
dual problem and consider upper envelopes instead of upper hulls. This duality, 
as explained e.g. in |2J p. 167], maps points to lines and vice versa in a way, 
that preserves above/on/below relations. In this setting a set of points becomes 
a collection of lines L, and the upper hull transforms to the upper envelope of 
these lines, i.e. the collection of line segments such that points on a segment 
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are not below any other line. An extreme point query, i.e. given a slope q find 
the point of the upper hull that has a tangent of slope q, turns into a vertical 
line query, i.e. given a vertical line with x-coordinate q, report the segment of 
the upper envelope crossing this line. Note that this is really only a change in 
point of view. There is no need to perform a computation to go from the original 
setting to the dual and back. 

We apply a standard dynamization technique that divides the current points 
into sets and keeps one deletion only data structure per set. Additionally there is 
a more explicit representation of the current upper envelope, namely an interval 
tree, that allows fast queries without requiring too much work for updates. Inside 
the interval tree have at each internal node a fully dynamic upper envelope data 
structure, a so called secondary structure. The running time improvement relies 
on a poly logarithmic bound on the size of the secondary structures. 

The description so far fits as well to the data structure proposed in Chan ^ . 
Compared to that data structure we apply improved deletion only data struc- 
tures. We also do some explicit grouping of the subenvelopes stemming from the 
dynamization, such that the number of secondary structure storing segments 
from one subenvelope is reduced. 

The remaining of this section is devoted to proving the following theorem. 

Theorem 2. Let U{n) and D(n) be two nondecreasing positive functions, where 
U{n) > logn and D{n) > logn. Assume there exists a data structure for the dy- 
namic upper envelope problem supporting Insert and Delete in amortized 0{U{s)) 
time, and Vertical Line Query in worst-case O(logs) time, where s is the total 
number of lines inserted. Assume further that there exists a data structure for 
semi-dynamic upper envelope problem supporting Build on a lexicographically 
sorted list of n points in amortized 0{n) time and Delete in amortized 0{D{n)) 
time, where n is the number of lines in the structure. 

Then there exists a data structure for the dynamic upper envelope problem 
supporting Insert in amortized Oflogn ■ L/(log^ n) / log log n) time and Delete in 
amortized 0{D{n) -\-logn ■ U{log^ n) / loglogn) time, and Vertical Line Query in 
worst-case O(logn) time, where n is the total number of lines inserted. 

Applying this theorem to the data structure of Overmars and van Leeuwen 
with U{s) = log^s and the result from Sect. |3 with D{n) = logn • log logn, 
we get Insert in 0(log n • log^(log'^ n) / log log n) = 0(log n • log log n) , and Delete 
in 0(logn • log logn). Applying the theorem again on this new data structure 
improves Insert to 0(logn • log log logn). The performance of the deletion only 
data structure is the bottleneck, that renders further applications of the theorem 
useless. 

For the purpose of describing our data structure, we separate it into sev- 
eral layers. We first describe the layers in a top down fashion, we start with a 
data structure that solves the fully dynamic upper envelope problem using some 
auxiliary data structures. For the analysis we proceed in a bottom up fashion, 
i.e. we always analyze the auxiliary data structure first. This avoids any forward 
references. 
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3.1 The Interfaces 

Fully dynamic upper envelopes. 

Insert Insert a line, given by the parameters a and b in the representation y = 
ax + b. Return a pointer to a new line data structure. 

Delete Given a pointer to a line data structure, delete that structure and the 
line it represents. 

Query Given a value v, report the highest intersection of a line with the vertical 
line given hy x = v. 



Query structure Q. This data structure combines several independent upper 
envelopes. It is asserted (and could be easily checked), that the list of line seg- 
ments in fact form envelopes. It is also asserted, that a line is present in at most 
one set and has therefore at most one segment. 

There is an active set of segments that is considered for queries. For all lists of 
segments it is asserted, that the segments from this list form an upper envelope. 
A segment is given by a line and an interval on the x-axis. 

Init set with active envelope Given a lexicographically sorted list L of lines 
and a list K Q L oi segments. Initialize a set data structure that can hold 
upper envelopes stemming from lines in L and insert K into the active set. 
It is asserted that K forms a complete upper envelope. Return a pointer to 
a new data structure representing the set. 

Delete set Delete a set given by a pointer, removing all segments from the active 
set. 

Replace inside an envelope Given a pointer to a set, pointers to (up to) three 
segments and a lexicographically sorted list of segments K with 

K = Here and are the same segment with a changed left 

boundary, and and differ only in the right boundary. It is explicitly 
allowed that (.a and (.uj are void, with the meaning that ^ is unbounded to 
the left and respectively to the right. Replace the three segments by K in 
the active set. It is asserted that the active set forms an upper envelope after 
the replacement. 

Query Given a value u, report the highest intersection of an active segment with 
the vertical line given hy x = v. 



Subeuvelope structure T~. This structure allows queries on a generalization 
of segments, namely subenvelopes. A subenvelope is an lexicographically sorted 
list of line segments where neighbors have precisely one point in common. We 
will maintain a small upper bound on the size of an subenvelope. Again it is 
asserted that the segments in fact are segments from upper envelopes. 

Insert Given a list L of segments, insert the subenvelope formed by L. Return 
a pointer to the newly created data structure of the subenvelope. 
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Delete Given a pointer to a subenvelope, delete that subenvelope. Return the 
segments of the subenvelope. 

Query Given a value v, report the highest intersection of an inserted subenvelope 
with the vertical line given hy x = v. 



3.2 Dynamization 

Throughout the following we assume that we know the value of n, the total 
number of insert operations, in advance. Standard doubling techniques justify 
this assumption. 

Starting from the monotonic data structure presented in Sect. 0 we apply a 
general dynamization technique for decomposable search problems attributed 
to Bentley and Saxe 0. The idea is that we divide the set of lines L into 
a partition C based on the order the lines are inserted. More precisely every 
set C G C has a rank. If there are d sets of the same rank z, we merge them into 
one new set of rank z + 1. Sets of rank 0 have size 1. We choose the parameter 
d = [logn], leading to at most r = 0{log^n) = 0(log7z/loglogrz) different 
ranks. This is also an upper bound on the number of times a specific line can 
participate in the merge of d sets. Furthermore the number e = |C| of sets is 
bounded by e = 0{rd) = 0(log^ zz/loglogn). Every set has a deletion only 
structure and a set in the query structure attached. 

The merge operation first deletes all the involved sets from the Query struc- 
ture Q. Then it orders the lines (dual) according to their slopes, which corre- 
sponds to sorting the corresponding (primal) points according to their x co- 
ordinates. Here we exploit that the sets we are merging are already sorted in 
that order. We use a heap of size d to iteratively find the remaining line with 
smallest slope. Then we invoke the Build operation of the deletion only data 
structure, and use the reported upper envelope in an I nit set operation of the 
query structure Q. We attach the returned pointer to the new set. 

For an lnsert(£) we create a new record for i that keeps the coordinates (slope 
and offset) and also a pointer to the set of C that currently contains £. Then 
we create a new set of size 1 and rank 0 and perform necessary merge operations. 
During the merge operations we update the pointers pi for all lines we move. 

If we want to delete a line i we look up the set C G C that contains i, and 
then we invoke the Delete (^) operation of the deletion only data structure from 
Sect. El This returns a list of new segments, which implicitly gives also the two 
neighbors of i. With this information we call the Replace inside set operation 
of Q. 



3.3 Grouping 

Now we implement the query structure using only a Subenvelope structure. We 
choose a block size parameter b — [log n/ log log rz] . 

The Init set with active envelope operation first deletes all pointers to blocks 
on the lines of the set. Then it groups the segments of K equally into as few as 
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possible blocks of size at most b. It inserts the resulting subenvelopes and stores 
the subenvelope pointer at every line. 

The Delete set operation walks along the set, deleting blocks pointed to by 
the lines and deleting the pointers as well. 

The Replace inside an envelope operation looks up the blocks where the three 
lines are stored. Then it deletes the pointed to subenvelopes, building a list L 
of segments that got deleted. In this list we replace by K. Then we 

group L optimally into blocks of size b. We insert the blocks and update the 
block pointers. 

The query gets directly handed over. This is correct, as all active segments 
are in some block. 

3.4 The Interval Tree T' for Subenvelopes 

We implement the subenvelope structure as an interval tree. The interval tree T 
is a rooted tree. We assume to know the number M of leaves of T. We choose 
the degree parameter B = [log n] . We keep T balanced by maintaining the 
invariants that the degree of a node is at most 2B — 1 and at least 2 at the root 
and at least B for all the other internal nodes. All leaves have the same distance 
to the root. A leaf f of T stores a (possibly unbounded) interval 1^, its range. 
Every internal node w of T stores its range the interval that is the (disjoint) 
union of the ranges of its children. To deal with a non constant degree of a node 
we maintain a dictionary (balanced tree) of the endpoints of the ranges of its 
children. For an arbitrary interval I we say that the node u of B corresponds 
to I if the range of u contains the interval, i.e. I Q lu, and for none of the 
children u of u it is the case that / is contained in the range ly of v. Note that 
there is always a unique node of B corresponding to an interval. We can find 
all the intervals containing a certain point p on the path from the root node to 
the leaf that contains p. We assert that the range of every leaf node contains at 
most one endpoint of the stored intervals. 

We store subenvelopes at the node in T that corresponds to their interval, 
i.e. the extent along the cc-axis. We store the segments of the subenvelope in the 
secondary structure at that node, i.e. as lines in a fully dynamic upper envelope 
structure. 

The Insert operation creates a record that has a list of the lines forming the 
subenvelope, the interval, and a pointer to the node of B- A pointer to this record 
is returned. It inserts the interval into B and finds the node uinB corresponding 
to the interval and inserts all the lines into the secondary structure Su- It stores 
the returned identifiers in a list in the newly created record. 

As we have the strong restriction that the range of a leaf should contain at 
most one endpoint of an interval stored in the tree, we might be forced to split 
nodes of T in a bottom up fashion. Assume that node u of B has too many 
children. Then we create a new right sibling v of u (creating a new root if u was 
the root) and move the right half of the children of u to v. We walk through the 
list of blocks being stored at u. For a block w we take to decide if they should 
stay at u, get moved to v or moved up to the parent p of m and v. If necessary 
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we delete all the lines of w from the secondary structure Su of u. If the block 
moves to v we insert the lines into S„. If it moves up to p, we keep the block 
w “on hold”, in case that p also gets split. During this we update the pointers 
between the nodes of T and the records of blocks. 

If a subenvelope has the interval ] — 00,00 [, it gets stored at the root of T, 
and it cannot cause any splits. We call such a subenvelope trivial. M accounts 
only for non-trivial subenvelopes. 

For the Delete operation we remove all the lines from the secondary structure. 

For a Query operation with value x, we determine the path p in'T from the 
root to the leaf v oi'T whose range contains x. For all nodes u on p we perform 
an upper envelope query for x on the secondary structure Su- We report the 
topmost of the answers. 

This answer is correct, because the block of the topmost segment at x is 
stored in one of the parents of the leaf v that contains x. 

3.5 Analysis 

Bound on the number M of nontrivial subenvelope inserts. We have 
to bound the number of operations on blocks performed within the query struc- 
ture Q. 

At the init operation we give every line a fractional coin that allows it to 
participate as a fraction 2/6 in a non-trivial insert operation, i.e. we need 6/2 
such coins to pay for a non-trivial insert. Then the init operation on a set of 
size m costs us [2m/6] non-trivial subenvelope insert operations. If the init 
operation gives rise to a nontrivial insert, it is paid for. 

A replace operation is going to pay for 3 subenvelope deletions and 4 suben- 
velope insertions. If there are more blocks to be inserted, the blocks are definitely 
half full, and only 2 blocks on each end contain any lines that have already used 
their coins. The remaining block insertions can therefore be paid with coins. 

Knowing that one line can only cause one replace operation and participate 
in r init operations, we get a total account of M = 0(ji + n ■ r/b) = 0{n + n ■ 
log n/ log log n • log log n/ log n) = 0{n). 

Bound s on the size of secondary structures in 7”. For every set in C we 
have at most B subenvelopes stored at a node v. With the bounds on the size 
of subenvelope and on \C\ we get s = 0{B • 6 • e) = 0(log‘* n). 

A query takes 0(logM -|- Q(s) • h) = 0{logn + log log n • log n/ log log n) = 
O(logn) time. 



Work in the split operations. Every split operation creates at least one new 
node. We will account on that node for all the insertions and deletions that 
happened during this single split. 

We charge the work of moving a block during a split operation entirely to 
the newly created node of T. For this we define the level of a node u of T by 
stating that leaves have level 0, and that the parent of nodes on level i has 
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level i + 1. Now we observe that an interval stored at u has both endpoints 
at some leaf below u. Hence the condition of having at most one endpoint of 
an interval per leaf implies that we have at most Ni = (2H)® intervals stored 
at a node of level i. Now let u be a node on level i. Then u was created by a 
split operation performed on one of its siblings v. So we know that v is also on 
level i and the split operation involved at most Ni intervals. Additionally we 
know e = 0{rd) = 0(log^ n/ log log n) which means for large n we have e < 
and that any node in T stores at most e ■ B < intervals. 

Adding these costs level by level in the tree, we get that the total number 
of intervals moved because of split operations is bounded by 0{{M / B)2B + 
(M/H2)4H2 + {M/B^)B^ + {M/B^)B^ + {M/B^)B^ + •••) = 0{M) = 0{n). 
We conclude that every subenvelope insertion causes in average constantly many 
moves of a subenvelope during split operations. 

Running time of the update operations in T. Given the previous para- 
graph, we conclude that an update operation of a nontrivial block in B takes 
amortized 0(log M+b-U{s)) time for finding the correct node in T and to pay for 
the insertions and deletions of the segments, including during split operations. 

Since U{s) > logs, we have b-U{s) = l7(logn/loglogn-loglogn) = l7(logn), 
so the amortized time of a non-trivial block insert operation becomes 0{b-U{s)). 

For trivial blocks it takes amortized 0{U{s)) time per segment. Note that 
even so the root node of T is special, the upper bound s on the number of 
segments stored there applies as well. 

Running time of the Qnery structure / Fully dynamic structure. In the 

init operation of the query structure we account for 2/6 nontrivial block insert 
operations for every line in the set. We already argued that this is sufficient to 
pay for the initial insert operation of that line (i.e. when the line appears on the 
upper envelope of the set we just initialized). Accounting also for the possibility 
of being inserted as part of a trivial block, we get a per line amortized time of 
0{U{s) + b-U{s)/b)^0{U{s)). 

Knowing that every line gets initialized at the worst r times, we get an 
amortized insert time for the fully dynamic data structure of 0(r • U{s)) = 
0(logn/loglogn • U{s)) as claimed in Theorem |21 

For the delete operation of the fully dynamic data structure we have to 
account for the delete operation in the deletion only structure, and for the replace 
operation in the query structure. As already argued, the replace operation has to 
account for a constant number of block update operations, yielding an amortized 
time of 0{D{n) + b-U{s)) = 0(D(n) -blog n • C/(s)/ log log n), the bound claimed 
in Theorem 13 

4 Other Queries 

With the so far explained data structure for vertical line queries we can efficiently 
answer a whole class of other queries on the upper envelopes. Assume the query 
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satisfies a so called locality property, that is for a vertical line q we can determine 
on which side of q the answer lies by solely examine the highest line intersecting q. 
Then we can use binary search to give an answer with O(logn) vertical line 
queries, that is in 0(log^ n) time. But this overhead is not always necessary. In 
the next section we will give an important example where the already explained 
data structure can be used to achieve a O(logn) query time for a more involved 
query. 

4.1 Arbitrary Line Queries 

The query we address is in the primal setting: given a point p in the plane report 
the two tangent lines through p touching the convex hull or state that the point 
is inside the convex hull. This corresponds in the dual to: given an arbitrary line, 
give the two intersection points of the line with the upper envelope, or “no” if 
no such intersection exists. The exposition here adopts the dual point of view. 
The important observation is, that our data structure has the same properties 
as the data structure in 0, the argument given there applies here as well. We 
only sketch the query algorithm in our setting. 

We use the following fact about arbitrary line queries to navigate in the 
interval tree of our data structure. 

Lemma 1. Let a and b be to walls and E' Q E a subset of lines s.t. the upper 
envelope of E' at a and b coincides with the upper envelope of E. Assume that an 
arbitrary line query for a line i on E' results in the right intersection point t. Ift 
lies between a and b then also the right intersection T of i with E lies between a 
and b. 

Let £ be the line query. The query algorithm starts at the root node of the 
interval tree. It performs the right intersection query on the secondary structure 
of the current node, updating the current answer. Then it descends to the child 
corresponding to the interval the current answer lies in. When it reaches a leaf, 
the current answer reflects the right intersection of £ with the upper envelope of 
all lines. 

Given that our secondary structures support line queries in O(logs) time, 
we have an overall query time of 0{{logB + log s)h) = OflogBlogn/logB) = 
0(log n). 

5 Applications 

As a prominent example we consider the fc- level of n lines, which is dually related 
to the fc-set question on n points. For this problem Edelsbrunner and Welzl |Zj 
gave an algorithm using the data structure of Overmars and van Leeuwen that 
constructs the fc-level in 0(n • log n + m- log^ n) time, where m is the size of the 
A:-level. Applying Chan’s data structure this improves to 0(n-log n+m-log^^® n) 
time, and using our data structure this yields an improved 0{n ■ log n + m- log n ■ 
log log n) time bound. A randomized algorithm using expected 0{\t+2{n + m) ■ 
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logn) time has been given by Har-Peled |S|, where At+ 2 (n + m) is the maximum 
length of a Davenport-Schinzel sequence of order t + 2 having n + m symbols. 

Basch, Guibas and Ramkumar Q considered a version of the segment inter- 
section problem: given a connected family R oi n red line segments and a con- 
nected family B oi n blue line segments in the plane, report all intersecting pairs 
from RxB. Chan ^ reported an improvement from 0{{n+m)-log^ n) time using 
Overmars and van Leeuwen’s data structure to 0{{n+m)-log^'^^ n) using Chan’s 
data structure. We get a further improvement to 0{{n + m) ■ log^ n ■ log logn). 
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Abstract. We propose an algorithm for maintaining a partition of dy- 
namic planar graphs motivated by applications in load balancing for 
solving partial differential equations on a shared memory multiproces- 
sor. We consider planar graphs of bounded face sizes that can be modified 
by local insertions or deletions of vertices or edges so that planarity is 
preserved. In our paper we describe a data structure that can be updated 
in O(logn) time after any such modification of the graph, where n is the 
current size of the graph, and allows an almost optimal partition of a 
required size to be maintained. More precisely, the size of the separator 
is within an 0{n^) factor of the optimal for the class of planar graphs, 
where 5 is any positive constant, and can be listed in time proportional 
to its size. The dynamic data structure occupies 0(n) space and can 
initially be constructed in time linear to the size of the original graph. 



1 Introduction 

Separator theorems are efficient and widely used tool for the design of efficient 
divide-and-conquer algorithms. Informally, a separator theorem claims that any 
graph from a given class can be divided into two or more parts of roughly the 
same size by removing a small number of vertices. The classical result of Lip- 
ton and Tarjan m shows that any n-vertex planar graph can be divided into 
components of no more than 2n/3 vertices by removing a set of no more than 
\/8n vertices. Moreover, such a separator can be found in 0{n) time. Other in- 
teresting results include separator theorems for the class of graphs of bounded 
genus Elan . the class of graphs of excluded minor ^], and classes of geomet- 
ric graphs ini Separator theorems have applications in solving efficiently large 
sparse systems of linear equations ima . for developing algorithms for VLSI lay- 
out design 0I2|, for shortest path problems [Zj, in parallel computing pni, and 
in computational complexity m 
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GR/M60750, and RTDF grant 98/99-0140. A two-page abstract of this work ap- 
peared in the proceedings of CCCG’98. 
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original graph 




(c) (d) 



Fig. 1. Mesh refinement operations: (a) adding a point inside a face; 
(b) adding a point onto an edge; (c) adding an edge; (d) edge dipping. 



A class of problems for whose solutions separator theorems are especially 
well suited is data partitioning and load balancing for parallel computing. The 
modern high performance computing systems have large number of processors 
and their memory is distributed among the processors. In order to achieve high 
efficiency and speed when using such computers, the data has to be allocated 
among the processors so that the computational load is even and the need for 
communication is minimized. Since this mapping problem is NP-hard, several 
approaches have been tried to find good approximate solutions. Popular parti- 
tioning techniques include Kernighan-Lin’s local search algorithm H31 , recursive 
spectral bisection jl^, simulated annealing [Tlj, and graph separators IIH]. The 
advantage of the graph-separator approach is that it works well for unstruc- 
tured meshes with good topology and that it gives a good guaranteed worst-case 
performance. 

In many time-dependent applications, after the initial partition, the data 
might need to be modified and then reallocated to the processors. Thus the 
problem for dynamic load balancing arises. For instance, in solving partial dif- 
ferential equations, the mesh might need to be modified (refined or coarsened) 
after every few time steps. For the case of triangular or bounded face size planar 
meshes discussed in this paper refinement types of modifications include adding 
a new point or edge inside an existing face, adding a new point onto an existing 
edge, edge or face flipping, and others [FigureOJ. Efficient algorithms should use 
the existing partition in order to compute the new one faster and with a small 
number of data reallocations. 

Unfortunately, there are no known deterministic algorithms that can recom- 
pute efficiently the separator after modification of the graph. The existing al- 
gorithms need to compute a separator of the new graph from scratch, without 
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using any information from the previous separator. Armon and Reif constructed 
a dynamic separator algorithm for planar graphs , but their algorithm is prob- 
abilistic (works in 0(log^ n) expected time per update) and it needs as an input 
the so called sphere packing representation pj of the input planar graph. Al- 
though such a sphere representation is known to always exist, it is currently not 
known whether it is computable in polynomial time. 

The main difficulty for designing a fast deterministic algorithm for maintain- 
ing partitions of dynamic planar graphs is related to the fact that most known 
algorithms for constructing separators for static graphs use a breadth-first search 
as essential step of the computation. There are no known algorithms that can 
dynamically maintain a breadth-first tree of a planar graph in polylogarithmic 
time. 

In this paper we develop an algorithm that avoids recomputation of breadth- 
first trees. Our approach is based on a representation of the current graph as 
a hierarchy of partitions of different grades for that graph. By handling the 
modification at an appropriate partition level, we can achieve rebalancing by re- 
allocating a small number of faces. Specifically, we prove that our representation 
allows a separator decomposition to be recomputed in 0(log n) time after a local 
insertion or deletion of a vertex or edge that preserves planarity, biconnectivity, 
and bounded face sizes, where n is the current number of faces of the graph. The 
total initial partition and the data structure can be computed in 0{n) time by 
using 0(n) space. 

The paper is organized as follows. In Section 2, we introduce the notation 
and outline our approach. In Section 3, we specify the requirements to the par- 
titions at each level of the hierarchy and describe efficient algorithm for their 
construction. The data structure describing the partitions at different levels of 
the hierarchy and its dynamic maintenance are discussed in Section 4. Finally, 
in Section 5 we formulate our main theorem and comment the results and their 
implications. 



2 Preliminaries and Algorithm Outline 



Embeddings and Regions 

A graph is planar if it can be embedded in the plane so that no two edges 
intersect except possibly at a shared endpoint. A graph G already embedded in 
the plane is called a plane graph. Faces of G are the maximal connected regions 
into which the embedding of G divides the plane. There is exactly one infinite 
face called the outer face of the embedding, all other faces are called internal. 
A face can be identified by the cycle of edges on its boundary. By R(G), E{G), 
and F{G) we will denote the sets of vertices, edges, and faces of G, respectively. 

Any set of faces is called a region of G. The subgraph that consists of the 
vertices and the edges incident with the faces of a region R is called a subgraph 
induced by R and will be denoted by G{R). A region R is connected if the dual 
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graph of G{R) is biconnected. The maximal connected subregions of R are called 
connected components of R. 

An edge e of G{R) is called an inner edge of R if both faces incident to e 
belong to R. Otherwise e is called a boundary edge. The boundary dR of a region 
R is the subgraph induced by the boundary edges of R. 

A partition TZ of F{G) is called any set of regions {i?i, • • • , i?,.} such that each 
face of G belongs to exactly one region of TZ. A partition TZ is called connected 
if all of its regions are connected and it is called weakly connected if it is either 
connected or each region has at most two neighboring regions. The boundary 
dTZ of TZ (called also boundary graph of TZ) is the union of the boundaries dRt, 
j = 1, • • • , r, of its regions. 

In the typical case dTZ may contain long paths of degree 2 vertices. In order to 
save time and space when dealing with such boundaries, we define a compressed 
boundary GB{TZ) of dTZ to be the plane graph resulting after the contraction of 
any maximal simple path of degree 2 vertices in dTZ to a single edge and any 
simple cycle to a triangle. GBiTZ) is a graph with \TZ\ internal faces corresponding 
to the regions of TZ. 

Definition 21 Let G be an n-face plane graph and e > 0. The partition TZ is 
said to be an e-partition of G if no region of TZ has more than en faces. 

We will make use of the following generalization of a result from p. 

Theorem 1 Let G be a plane graph with n faces whose maximal size is d and 
let h > df be an integer. Then there exists a weakly connected h/n-partition 



TZ = {Ri, ■ ■ ■ , Rr} of G that satisfies the conditions 
(i) \dRi\ < Vh; 


(1) 


(ii) \TZ\ < cn/h , 


(2) 



where c > 1 is a constant. 

In this paper we present an algorithm for maintaining a hierarchy of e- 
partitions of a dynamic planar graph G. This algorithm can be used to solve 
the load balancing problem, since for any integer p > 1 one can pick an e- 
partition for e ~ 1/p and then distribute the regions between the processors 
using a greedy method so that the sum of the sizes of all regions assigned to a 
processor does not exceed n/p + en < 2n/p. If better balancing is required, we 
just need to reduce the value of e accordingly. 

Our algorithms will handle the following update and query operations. We 
assume that the original graph is plane with face size bounded by some constant 
d and that no operation violates planarity or creates a face with size greater 
than d. 

— inserLvertex(v,e): Adds a new vertex v and replace edge e = (u,w) by two 
edges (w,f) and (v,w). 
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— delete-vertex(v): Deletes vertex v of degree two and its incident edges (u,v) 
and (v,w) and adds edge {u,w). 

— insert-edge(u,w,F): Adds a new edge (u, w) inside the face F. Assumes u and 
w are non-adjacent vertices on F. 

— delete-edge(e): Deletes edge e = (u,w). Assumes the degrees of u and w are 
greater than two. 

— list-separator (e): Given any e > 0, lists an en-separator of the current graph. 

Note that these update operations are powerful enough to allow any plane graph 
to be transformed into any other plane graph using a sequence of update oper- 
ations. 

Our Approach 

We need to maintain a representation of the graph G that will support fast 
separator queries, where each query will ask for an e-partition of the current 
graph. For that purpose we maintain a balanced tree called a partition tree that 
represents a hierarchy of partitions of G. The partition corresponding to the root 
defines the coarsest partition of G into at most h regions of roughly the same 
size, where h will be an appropriately chosen constant . Each subsequent level 
defines a finer partition, with the leaves corresponding to single faces. Thus, the 
levels of the partitions tree form a hierarchy of partitions of G. In particular, the 
leaves of T, on level 0, represent the faces of G, the nodes at the level 1 represent 
the regions of an /i/n-partition TZ^, the nodes at level 2 represent regions of an 
/i/n-partition of the graph GB{'R}), and so on. 

When a local change is applied on the graph G, it can affect only nodes that 
are ancestors of the face where the change occurs. The algorithm maintains a 
number of local invariants that guarantee the validity of the partitions and the 
small height of the tree. The algorithm locates the lowest level of T, where some 
of these invariants is violated (if any). Suppose that at node N any of these 
invariants is violated. The algorithm considers the subtree consisting of N plus 
its children and grandchildren and redefines the children of N by finding a new 
partition of the graph induced by the grandchildren of N . This subtree has 0(1) 
size and thus it can be rebalanced in 0(1) time so that all invariants hold. Since 
the same procedure might need to be applied to all ancestor nodes from N to 
the root of T, the total update time is proportional to the height of T, which 
will be shown to be O(logn). 

3 P— Tree Data Structure 

In this subsection we define and study the properties of a data structure, called a 
P-tree, for describing partitions of a plane graph G and show that the structure 
can be constructed in linear time. For the initial construction of the tree, we will 
define a sequence (hierarchy) of partitions whose elements will be then stored at 
the nodes of the tree. 

More precisely, let G be a plane graph with n faces each of size not exceeding 
d. Let h be an integer constant such that h > max(d^, 2c), where c is the constant 
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from Theorem Q By applying Theorem Qiteratively, we will construct a sequence 
of graphs and their ^.-partitions 

G = • • • , , G(') 

called a GR-sequence, so that the graph G*-°^ is the original graph G and the 
region partition consists of a single region of no more than h faces. 

Constructing the GR-Sequence 

Assuming that has already been constructed for some i > 1 and 

that it has rii-i > h faces with each face boundary of at most Vh edges, we can 
construct G*^®^ from by applying the following procedure. We let G^°^ = G. 

Algorithm GR-sequence 

Step 1: Apply Theorem [0 with a parameter h on to find an h/rii-i- 

partition TiR'> of such that < crii_i/h. 

Step 2: For each region R G 72.^*^ construct the subgraph G’~^(R) of G*“^ 
induced by R and find its boundary dR. 

Step 3: If 7Z(^^ is a connected partition, then G^*^ will just be the compressed 
boundary of 72.^*^ . In the general case, replace each region of 72^*^ by a single 
face of G*^*^ as follows. 

3.1. Compute the compressed boundary graph G77 = G73(72^®^) of 72^®). For 
each new edge e defined during the compression store the list of edges 
of the path corresponding to e. 

3.2. For any non-connected region Rnc of 72*^®^ merge the faces in GB cor- 
responding to the connected components of Rnc into one of those faces 
(arbitrarily selected) . 

Note that the above algorithm constructs G^*i together with its embedding in 
the plane and its faces correspond to the regions of 72^*i. According to Theorem [D 
G^*i has no more than crii-i/h faces and the size of each face does not exceed 
Vh. Thus the same algorithm can be used for constructing G^'+^i assuming G(d 
has at least h faces. 

Since the parameter h was chosen to be at least 2c, the graph G^*i will be 
obtained in no more than logn-|-l iterations. Iteration k takes time proportional 
to the size of and thus the total time for constructing the G7?-sequence 

will be 

i-i 

J20{{c/h)^n) = 0{n). 

k^O 



Constructing the P-Tree 

Let G be a plane graph for which a G7?-sequence S has already been con- 
structed. A P-tree for G with respect to S' is a data structure whose elements 
are associated with a rooted tree Th{G) with I + 1 levels. The fc-th level of Th{G) 
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face F(N) 



6 O 6 6 o 6 

region R(N) graph 3R(N) 



Fig. 2. Information associated with node N at level k of Th{G). N has 4 children 
corresponding to the 4 faces of R{N). The relevant subgraphs of and are 

shown. Each face of corresponds to a region of G. 



for k = 1, • • • , Z contains the information gathered during the /c-th iteration of 
the construction of S, namely, the partition its boundary and the 

graph . The nodes at the fc-th level of the P-tree correspond to the regions 
of (or, equivalently, to the faces of The leaves of Th{G) are at level 

0 and each leaf corresponds to a face of the original graph G = G^^'> . The root 
node is at level I and contains the graph G^^\ 

With each node N on level fc > 0 we associate the following information 
(Figure 1^: 

(i) F{N) - the face of corresponding to N; 

(ii) TZ{N) - the region of corresponding to F{N); and 

(iii) &R{N) - the boundary graph of TZ{N). 

If iV is a node on level 0, then F{N) is defined as in (i) above and TZ{N) = 
dn{N) = F{N). Let G{N) denote the graph G{n{N)). 

Note that the amount of data associated with any node iV of T is proportional 
to the size of G{N). Since G{N) is a planar graph with no more than h faces and 
each face has size at most \/h, the size of G{N) is (which is a constant 

since h = 0(1)). 

The edges of Th(G) (called links hereafter to be distinguishable from the edges 
of G*^^^) connect certain pairs of nodes on consecutive levels. More precisely, there 
is a link between a node N from level fc — 1 corresponding to a region R, and 
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a node N' from level k corresponding to a region R' , iff R is transformed by 
Algorithm G-R-sequence into a face of R'. By Theorem ^ each node of Th{G) 
has at most h children. 

Note that any edge of appears in either three or four nodes of the tree 

Th{G). It appears twice on the (fc — l)-th level at the nodes that correspond to 
its two incident faces and also at one or two nodes on the fc-th level, depending 
on whether this edge is entirely inside some of the regions of , or lies between 
two such regions. We assume that there are pointers between any edge and its 
occurrences on the lower level. These pointers define an edge forest EF on the 
set of edges of for fc = 0, • • • , Z that is consistent with Th{G). Namely, all 
edges on path that is compressed to an edge e are descendants of e. Thus, the 
edges of the graph G*^^^ for fc > 1 have descendants that are edges of On 

the other hand, an edge e' from level k — 1 has an ancestor if and only if e' is 
on the boundary of some of the regions of dR^ . 

In accordance with EF, we define weights wt{-) on the edges of G^^^ as the 
number of their descendants in F at level 0. The weight of the boundary of 
a region or a face is defined as the sum of the weights of the edges on that 
boundary. 

Definition 31 A P-tree node N is called balanced, if all of the following three 
conditions are satisfied. 

(Bl) N and each child of N have no more than h children each. 

(B2) The ratio between the number of the children and the number of the grand- 
children of N does not exceed c/h, where c is the constant from Theorem, H] 

(B3) For any child Ni of N wt{dF{Ni)) < dh^^~^^>G^ where k is the level of 

N. 

A P-tree is called balanced if all its non-root nodes are balanced. 

By Algorithm GR-sequence and Theorem 0 we have the following. 

Lemma 31 If G is a plane graph with n faces, then the algorithm from this 
section constructs in 0(n) time a balanced P-tree for G. 

Proof. Since Step 1 of Algorithm GR-SEQUENCE constructs an /i/ni_i-partition 
of any node of the tree has at most h children, implying (Bl). Property 

(B2) follows from conditions (1) and (2) of Theorem [D Finally, Property (B3) 
follows from condition (1) of Theorem ^ the assumption that the maximum face 
size of the original graph is no more than d, and an induction on k. 

We will maintain our P-tree balanced. 



3.1 Update and Query Operations 

Next we describe the basic operation on P-trees used to maintain the balance 
property of nodes. It is possible that more than one node is unbalanced at a 
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time, but all unbalanced nodes belong to a single simple path from certain node 
to the root of the tree. 

Balancing the Tree 

We will first describe the basic algorithm for maintaining the balance of the 
P-tree, called Fix- Tree. 

Algorithm Fix-Tree starts at an unbalanced node on a lowest possible 
level and makes the subtree consisting of N and all its descendants balanced in 
constant time. Then the same operation is applied on the parent of N , and so 
on, until the root is processed. 

Let iV be a node at level k, 2 < k < I -\- 1, that is unbalanced, but all its 
proper descendants are balanced. Denote the subgraph of induced by the 

faces of the grandchildren of N by Repartition using the 

algorithm from Theorem 0 and construct again the portion of the tree (with 
height two) rooted at iV, its children, and its grandchildren using algorithm 
similar to the one for construction of P-trees. The time required to fix N will be 
proportional to the number of faces of (A^), which is 0{h^) by of Property 

(Bl) of Definition El Since the height of the P-tree is O(logn), balancing the 
tree will take O(logn) time. 

Implementation of the Update Operations 

We will describe and analyze the implementation of the four update opera- 
tions described in Section 2. Our assumption is that the current graph G is a 
connected plane graph with no vertices of degree 1 and no face size exceeding d 
and that the update operations preserve these properties. By T we denote the 
current P-tree of G. 

insert_vertex{v,e) The operation asks that edge e = (u,w) be replaced by 
edges Cl = (u,v) and C 2 = {v,w). 

Let Pi and P 2 be the two faces of G incident to e and let Ni and N 2 be 
the corresponding leaf nodes in T. First, update at nodes Ni and N 2 the 
corresponding descriptions of Pi and P 2 by deleting e and inserting ci and 
62 into the corresponding doubly linked lists. Update the weights of all edges 
that were ancestors of e in the edge forest EF associated with T. Since these 
updates can change the weights of the faces associated with the ancestors of 
A^i and N 2 , we need to apply Fix.Tree algorithm on the lowest unbalanced 
ancestors of each of N\ and N 2 - 

deletejuertex{v) This operation deletes a vertex v of degree two, thereby chang- 
ing the common boundary of the two faces incident to v. Hence, deletexvertex 
can be implemented in a similar way as insert-vertex by changing the in- 
formation associated with the two leaf nodes representing the faces incident 
to V and updating the weights of 0(1) edges per each level of the edge- forest 
associated with T. 

insert-edge{u,w, F) Given two non-adjacent edges u and w on the same face 
F of G, insert-edge adds a new edge e = (rt, w) inside F. 
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Denote by and F2 the faces into which e splits F. Delete the leaf node 
N that corresponds to F and create two new nodes Ni and N2 representing 
the new faces. Make Ni and N2 children of the parent PN of N. Next, 
update the region subgraph stored at PN by adding to TZ{PN) the edge e 
inside face F. Finally, execute algorithm Fix-Tree on the lowest unbalanced 
ancestor of PN. 

delete-edge{e) This operation deletes edge e = (u,w), assuming the degrees of 
u and w are greater than two. 

Let Fi and F2 be the two faces of G incident to e and let Ni and N2 be the 
corresponding nodes in T. Let F be the face that results after the deletion 
of e and the merge of F\ and ^2- In order to implement that operation, we 
delete N\ and N2 and create a new leaf node N representing the face F. 
Assume that and N2 have different parents, say PN\ and PN2- In this 
case we delete N2 and replace with the new node N. As a result, the 
number of children of PN2 is reduced by one and one of the children of PN\ 
represents a larger region. We make the corresponding changes in the parents 
PNi and PN2- These include updates in the structures F{PNi), TZ{PNi), 
and dTZ{PNi), i = 1,2. In particular, e is deleted from both faces F{PNi) 
and F{PN2) and all edges of F2. We continue to make similar changes on 
the ancestors of F{PNi) and F{PN2) until the nearest common ancestor of 
Ni and N2 is reached. 

Finally, we run the Algorithm Fix-Tree on the parents of iVi and A^2- 

If in result of some update operation the degree of the root R becomes one, 
then we cut R. In case the degree of R becomes greater than h+1, then we define 
a new vertex R' to be a parent of R and run Algorithm Fix-Tree on vertex R' . 

Note that any of the update operations described above changes information 
at only constant number of nodes at any level of T and that we applied Fix-Tree 
on only one or two nodes of T. If all nodes of T are balanced, then by Condition 
(B2) the height of T is at most log^/^ n < logn, since we have chosen h > 2c. 
Hence the time complexity of Algorithm Fix-Tree is also O(logn). Thus we have 
the following lemma. 

Lemma 32 A balanced P-tree representing a planar connected graph G with no 
vertex of degree one can he maintained subject to the operations insert _vertex, 
delete -Vertex, insert_edge, and delete_edge in O(logn) time per operation, as- 
suming the maximum face size never exceeds a constant d. 

Extraction of an e- Partition 

Recall that any node Ni at the fc-th level represents a face F{Ni) of G^^\ Sim- 
ilarly, the children of Ni correspond to the faces of TZ{Ni) (the region of 
associated with F{Ni)), and so on. Therefore, Ni defines a region in the graph 
G = represented by the leaves that are descendants of Ni. We denote that 
region of G by R{Ni). Let TZ{G, k) denotes the partition {R{Ni),- ■ ■ , R{Ns)} of 
G, where {Ni, ■ ■ ■ , Ns} is the set of all nodes on level k. 
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Lemma 33 Let 1 < k < I and £k = h^jn, where n is the number of faees in 
G. Then the partition TZ{G,k) is an £k~partition of G with boundary of size not 

exeeeding dd — . 

Proof. Clearly, TZ{G, k) is a partition of G since any face of G (which is a leaf 
of the P-tree of G) belongs to exactly one of the regions R{Ni). For the size of 
the regions of that partition TZ{G, k) we have 

<h^ = EkU, for j = 1, • • • , s, 

since the maximum degree of T is no more than h + 1. For the size of the 
boundaries dR{Nf) we have 

\dR{Ni)\ = wt{F{Ni))<dh*^/^, 

according to Condition (B2) of Definition The number of nodes at level k is 
at most where Z + 1 is the number of the levels in T. Therefore, the total 
weight of the boundary of TZ{G, k) is 

rlogn/log(?i/c) 

wt{n{G,k)) < dh^^'^h^-^ = dh^-^/^ < d J 

Zl2 

log n log h / ^ ] 

= d2 i°s('“/-=) /y/£k = 

Now, in order to process a query asking for an e-partition of G for some e G 
(0, 1), we determine the level k for which h^~^ < en <h^ and apply LemmaESl 
Let us estimate the time necessary to list the boundary of the resulting partition. 
Recall that F{N) is a compressed image of the boundary dTZ{N) of the region 
TZ{N) of Each edge of F{N) represents a path on d'R{N). Each edge of 

dTZ{N) itself represents a path in (if fc > 2), and so on. Therefore, each 

edge of F{N) can be traced down to a path in the original graph G = G^^\ The 
boundary dR{N) of the region R{N) can be extracted from the P-tree in time 
0{dR{N)). Thus the boundary of the partition can be listed in time proportional 
to its size, which is 0{^/nje). Thus we have the following lemma. 

Lemma 34 The k-th level of a P-tree represents a h'^ -partition of G whose 
boundary B ean be listed in 0(|P|) time. 

We summarize the above results in the following theorem. 

Theorem 2 Let G be a plane graph with n faces. Then for any 5 > 0 a data 
structure exists that 

(i) can be constructed in 0{n) time; 

(ii) supports operations insert .vertex, delete.vertex, insert.edge and delete.edge 
in O(logn) time assuming the maximum face size of any intermediate graph 
is 0(1). 

(Hi) supports operation list_separator (e) for any e > Q in time proportional to 
the separator’s size, which is bounded by 0^^/nP'^^Je). 

Proof. Follows from Lemmas for h = . 
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1 Introduction 

Each test or feature in a classification system defines a set partition on a class of 
objects. Adding new features refines the classification, whereas deleting features 
may result in merging previously distinguished classes. As an illustration, con- 
sider the set of automobile types { VW Beetle, Toyota, Lexus, Cadillac }. The 
feature size partitions the cars into sets of small and large cars, {{ VW Beetle, 
Toyota}, { Lexus, Cadillac }}. The feature domestic- origin partitions the cars 
into {{ VW Beetle, Toyota, Lexus }, { Cadillac }}. The feature ugly-shape dis- 
tinguishes { VW Beetle, Cadillac } from { Toyota, Lexus }. Incorporating both 
size and origin induces the refined partition {{ VW Beetle, Toyota}, { Lexus }, 
{ Cadillac }}, whereas the union of all three features completely distinguishes 
the types of cars. In fact, size and ugly-shape are sufficient for complete iden- 
tification, so domestic- origin could be deleted from the set of features without 
affecting the induced partition. 

Efficiently maintaining the partition induced by a set of features is an im- 
portant problem in building decision tree classifiers. For example, in building 
an optical character recognition (OCR) system fl5llb| based on point-probe de- 
cision trees P|, each of the 1500-plus pixels in each character-sized window of 
the image may be evaluated as a possible feature. An important goal is to find 
a small, robust set of probe points sufficient to distinguish among the 70-plus 
characters in a font, a process that may require repeatedly inserting and deleting 
features to see the impact on the final classification. 

In this paper, we introduce techniques to speed up this process of feature 
identification. We propose a series of data structures for maintaining a collection 
of set partitions on elements U = {!,..., n}. The data structures efficiently 
support the following three operations: 

— Insert(P,S) - add a new partition P to the set of partitions S. 

— Delete(P,S) - delete existing partition P from the set of partitions S. 

— Report (S) - report the set partition of U induced by the set of partitions in 
S. 

Previous Work. A variety of data structures for sets and set partitions are known, 
including dictionaries and bit vectors, but these are not directly applicable to 
our problem. The primary difficulty of our problem lies in the fact that deleting 
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a set partition may or may not result in the merger of two parts of the current 
induced partition, depending upon which other set partitions are included in the 
data structure. Union-find data structures HH provide some support for merging 
disjoint subsets (as occurs on deleting a partition), but do not permit us to break 
up subsets (as occurs on adding a partition). 

Partition refinement techniques are used in a variety of algorithms, notably 
minimizing deterministic finite automatas | |1 2| and its generalizations |1 4| . Habib, 
et.al. HH demonstrate that partition refinement can lead to simple and efficient 
algorithms for graphs, strings, and matrices - although none of these operations 
involves deleting arbitrary set partitions. 

Yellin efficiently supports a variety of subset testing operations (in- 
sert/delete elements, create subsets, subset and intersection queries) in 
0(n^/^ log n) time per operation, but it is not clear how to use these opera- 
tions to improve even the naive bounds for our problem. Further, near matching 
lower bounds are known on the complexity of any data structure that supports 
these operations p|. We have recently learned of a data structure for maintain- 
ing dynamic set partitions in 0(n) amortized time under all three operations by 
Calinescu P|. 

The problem of maintaining induced set partitions can be reduced to up- 
dating an ambiguity graph on the n elements, where the presence of edge (*,j) 
indicates that elements i and j occur in different parts of at least one of the k par- 
titions. An extensive literature exists on efficient dynamic graph algorithms [Zj, 
for such tasks as maintaining connected components under edge insertion and 
deletion. However, the insertion or deletion of a single n-element set partition 
can effect the status of 0(n^) edges in such an ambiguity graph, rendering such 
an approach infeasible. 

Our Results. In this paper, we present a collection of efficient and practical data 
structures for maintaining set partitions, as well as several generalizations of the 
problem. Particularly interesting is the variety of algorithmic techniques which 
they encompass, including classical balanced trees, randomization and random 
walks, suffix trees, and spanning trees of low stabbing number. In particular: 

— We provide a data structure that supports the insert and report operation 
in optimal 0(n) worst-case time, for general set partitions. Deletion takes 
0{nlg k) time, where k is the number of set partitions currently in the data 
structure and n is the number of elements in each partition. These results 
are relatively straightforward, it appears nontrivial to improve all operations 
to 0{n), which is the best possible complexity. 

— We provide randomized Monte Carlo and Las Vegas data structures that 
support all three operations on bipartitions in linear or near-linear expected 
time, although the Las Vegas bounds are amortized. The Monte Carlo data 
structure is asymptotically optimal, and the Las Vegas data structure is 
within a factor of a{n) of optimal. We believe that our Monte Carlo data 
structure is particularly practical because of its simplicity. It appears widely 
applicable and is used as a building block for other algorithms in the paper 
such as the Las Vegas data structure and the geometric data structures. 
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— Robust classifiers compensate for noisy features by requiring more than one 
piece of evidence to distinguish between every pair of objects. We provide an 
alternate data structure that permits us to efficiently insert /delete partitions 
and query arbitrary pairs of elements {x, y} to obtain the approximate num- 
ber of partitions currently distinguishing x from y. Insert/delete run in time 
O(nloglogn) and query runs in polylogarithmic time. This data structure 
uses techniques from random walks on a line. Randomization and approx- 
imation appear to be powerful techniques in this setting, because, to our 
knowledge, the best exact deterministic techniques require O(n^) time per 
insertion or deletion of partitions. 

— We provide an efficient data structure for maintaining geometric set par- 
titions, where the set partitions are induced by linear separators of points 
in the plane. We achieve 0{^/n\ogn) time for insertion/deletion and linear 
report time, after an initial 0(n^ ® log n) preprocessing step. 

— We provide the first data structures for efficiently maintaining sorted strings 
under character insertion/deletion. As an application, we use this structure 
to find the shortest run of distinguishing features from an ordering of k 
binary features in optimal 0{nk) time. 

Our paper is organized as follows. In Section I3 we present deterministic data 
structures for maintaining set partitions. More efficient randomized data struc- 
tures are presented in Section El The problem of maintaining robust classifiers 
is discussed in Section 0 The special case of set partitions induced by geometric 
arrangements is discussed in Section 0 A generalization of our problem, sorting 
strings under character insertion/deletion is addressed in Section 1^1 

2 Basic Results: Deterministically Maintaining Set 
Partitions 

In this section, we present an efficient data structure for deterministically main- 
taining set partitions under the operations insert, delete, and report. For delete, 
we assume that we are given a pointer to the set partition in question and 
hence defer the issue of retrieving these pointers to an auxiliary dictionary data 
structure. 

Notation. We use the following notation throughout the paper. Let the univer- 
sal set U = {1, . . . ,n}. Each set partition P partitions U into parts(P) disjoint 
subsets Ri, • ■ • , Tparts(P) = U. Without loss of general- 

ity, we identify these subsets by the integers (1, . . . ,parts(P)), respectively. Let 
part(P, i) denote the part of P containing element i. 

Lemma 1. Let A and B be set partitions of U = {!,..., n}. The induced par- 
tition (or refinement) of A and B can be computed in 0(n) time. 

The proof of Lemma Q appears in the full version of this paper. Repeated 
application of LemmaHyields a data structure that supports insertion and report 
in linear time but does not explicitly support deletion. A naive solution could 
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recompute the induced partition from scratch on each deletion by repeatedly 
applying Lemma Q for a total cost of 0{kn) per deletion. 

Lemma 2. Set partitions can be dynamically maintained such that the insertion 
and deletion operations take 0{n\gk) time, while report can be performed in 
0{n) time. 

Proof. We maintain a balanced binary tree whose k leaves comprise the set of 
input partitions S, and each intermediate node is the induced partition of its 
two children. Therefore, the root of this tree represents the induced partition of 
S, and can be produced in linear time to satisfy a report query. 

Insertion and deletion can be implemented as in any balanced binary tree 
such as jOj. Insertion and deletion in a red-black tree require 0(1) rotations 
in the worst case. Although each rotation affects only a constant number of 
nodes, the induced partitions on all 0(lg k) intermediate root-to-leaf nodes must 
be recomputed using Lemma ^ so that the time required for insertion and/or 
deletion is 0{nlgk). □ 

We note that a similar structure, called a partition tree, appears in a different 
context in Yellin m We can modify the data structure of Lemma 0 to reduce 
the complexity of insertion to linear: 

Theorem 1. Set partition can be maintained with 0(n) insertion and report, 
and O(nlgfc) deletion. 

Proof. Instead of maintaining a conventionally-balanced binary tree in the struc- 
ture of Lemma 0 we maintain a forest of perfectly-balanced binary trees. 

As before, the leaf level of our forest contains all k of the input partitions. 
We number them from 1 to fc according to time of insertion. On each insertion, 
we will add one leaf to one tree, and construct at most one additional internal 
node. Denoting where the leaves reside as level 0, we will add a new internal 
node at the Ah level every 2*th insertion. For each level i, we will maintain a 
FIFO queue of pointers to the roots of trees of height i. In addition to this forest, 
we will also maintain a separate global induced partition, initially n}. 

Inserting a Partition: We insert the fcth partition as follows: 



1. Increment the partition counter k. Construct a new leaf node for par- 
tition Pk. Add the node k to the end of the level 0 queue. 

2. Refine the global set partition with Pk using the algorithm of Lemmad 

3. Define j to be the largest integer such that fc -|- 1 > 2L Compute b, 
the position of the least significant 1-bit of the binary representation 
of s = fc -I- 1 — 2L If s = 0, then b is undefined. 

4. Unless s = 0, dequeue the two oldest elements A and B of level queue 
b. Merge the associated set partitions of A and B using the algorithm 
of Lemma 0 Construct a new internal node to contain the refined 
partition of A and B, and enqueue this node at level 6-1-1. 
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To implement report, we simply return the global result partition. To imple- 
ment deletion, we replace the partition to be deleted by the partition of the last 
leaf to have been inserted, and delete the internal node (if any) constructed dur- 
ing the insertion of this leaf. We then recompute the 0(lgk) induced partitions 
of the internal nodes on the two effected root-to-leaf paths. Finally, we compute 
the new global result partition by merging all the 0(lg k) root partitions in our 
forest. Hence it takes 0{nlgk) time to do a deletion. 

Note that the structure of forest depends only on the number of partitions 
currently in S, and is independent of any deletions which may have taken place. 
Hence to measure the complexity of insertions we can safely assume that k 
insertions were performed sequentially. □ 

3 Randomized Data Structures for Maintaining Set 
Partitions 

For simplicity we assume that each partition that we insert is a bipartition, 
meaning it divides the elements into exactly two subsets. The results in this 
section can be generalized to the setting in which a partition breaks the elements 
into D sets, at an extra cost proportional to D. 

We say that an event E occurs with high probability {w.h.p.) if for any c > 0 
there exists a proper choice of constants such that Pr [if] > 1 — n“°. 

3.1 Monte Carlo Algorithm 

Colors of Elements. We first describe how to maintain the partition information. 
At each step t of the algorithm, an integer Ct[i] is associated with each element i. 
We call Ct[i] the eolor of element i at step t. Specifically, Ct[i] £ {0, . . . , F — 1}, 
where P has size polynomial in n. That is, P £ 0{n‘^), for some constant c. We 
maintain the invariant that w.h.p., if two elements i and j are in the same set 
at step t in the (cumulative) partition S iff they have the same color, that is, 

Ct\i] = Ct[j]. 

We also store the partitions P\ . . . Pk that comprise S, where the partitions 
are ordered by increasing insertion time. We can access any element in the set 
of partitions in time 0(log K) £ 0(n) using a balanced tree or any other basic 
data structure. 

Inserting a Partition. A new partition is supplied as a 0-1 array A[1 . . . n], where 
A[i] £ {0, 1}. We insert the k-th partition in step t as follows. 



1. Tfc := randomly chosen integer £ {1, . . . , P — 1}. 

2. Pk[i] :=rk-Ak[i], fori = l...n. 

3. Ct[i] := Ct-i[i] + Pk[i] mod P, for t = 1 . . . n. 

4. Store Pfc[l . . . n] in the list of partitions. 



Note that once we have calculated Ct[i], we no longer need to store Ct-i[i\. 
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Deleting a Partition. We delete a partition Pk (in step t) as follows. 



1. Find Pk[l ■ ■ .n] in the partition list. 

2. Ct[i] := — Pk[i] mod P, for z = 1 . . . n. 

3. Delete Pfe[l . . . n] from the partition list. 



Observe that errors are one-sided. Namely, if two elements have different colors, 
they belong to different sets of S. An error occurs whenever two elements as- 
signed the same color actually belong to different sets. We obtain the following 
theorem. 

Theorem 2. Insertions and deletions of partitions are executed in time 0{n). 
If the algorithm runs for a polynomial number of steps, then for sufficiently large 
P = 0(n°), w.h.p. all insertions and deletions are executed correctly. 

The proof of Theorem |21 appears in the full version of this paper. We note 
that the Monte Carlo algorithm should be extremely fast because it only uses 
a small number of additions and subtractions. It is interesting to note that our 
Monte Carlo algorithm suffices for the practical application of building a tree that 
distinguishes all objects using a small number of probes. Our randomized scheme 
will never classify two objects as different which are in fact indistinguishable, 
and hence the only consequence of being unlucky is to add a small number of 
additional probes to the test set. 

3.2 Las Vegas Algorithm 

We now describe the Las Vegas algorithm for the set partition problem. The 
time complexity is almost the same as for the Monte Carlo algorithm except 
that now it is amortized. In order to make the algorithm Las Vegas, we remove 
the probability of error from the invariant that whenever two elements have the 
same color, they are part of the same set in S. Thus, each time we perform an 
insertion or deletion, we must verify that this invariant holds. 

Verifying an Insertion. We first show how to verify an insertion for each step t. 
This entails adding an additional step at the end of the operation: 



5. Verify that for all i,j, ( Ct[i] = Ct[j] ) ( Ct-i[i] = Ct-i[j ] ). 

6. If so, continue. If not, an error is found. Spawn off an independent 
execution or run any other alternative protocol. 



Step 5 can be executed in linear time by maintaining some additional struc- 
ture of the elements. Namely, in each step we will keep the elements in sorted 
order by increasing color. To do so, we use an additional array i7([l . . .n] = 
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. . . TTn ^ , where = j means that in step t, j is the element with the £-th 
smallest color. (Ties are broken by taking into account the number of the ele- 
ment.) If the ordering 7Tt_i[l ... n] in step t — 1 is known, the ordering IIt[l . . .n] 
can be computed in linear time by merging three sorted (and interleaved) lists: 

• the elements of ilt [1 . . . n] whose colors do not change between step t — 1 
and step t, 

• the elements of 7Tt[l . . . n] whose colors increase by 

• the elements of 7Tt[l . . .n] whose colors increase by rk and then (by the 
rules of modular arithmetic) decrease by P. 

Thus, step 5 can be broken into these substeps: 



5a. Compute from ^^ . . . 7ri* by merging three lists. 

5b. Verify that for £ = l...n, {Ct['r^f'^] = 

This test requires linear time because we only have to compare the 
color of each element with its neighboring elements and 
in the ordering. 



Verifying a Deletion. We now show how to verify the deletion of partition 
Pdei[£ ■ . ■ n] in step t. Let Px^ ■ ■ ■ Pxm be the existing partitions in the system 
after Pdei[£ ■ ■ - n] is deleted. Let Px^. be the partition that appeared in S just 
before Pdei- 

Verifying deletions is more complicated for the following reasons. Suppose 
that after the deletion of a partition Pdei[£ ■ ■ .n\, two elements i and j have the 
same color. We do not know a priori whether this is because the last partition 
separating i and j has been removed, or whether i and j are erroneously assigned 
the same color and are in fact separated by many partitions. Thus we verify 
deletions as follows. 



4. Verify that if Ct[i] = Ct[j], then i and j are in the same set of all 
partitions Px^ ■ ■ ■ Px„ (except for Pdei[l . . . n]). 

5. If so, continue. If not, an error is found. Spawn off an independent 
execution or run any other alternative protocol. 



We now show how to make the verification efficient. Note that step 4 could 
potentially be very expensive because it may involve scanning through the entire 
list of partitions Px^ ■ ■ ■ Px„ ■ Despite this, we show that the amortized cost of 
verifying a deletion is 0(na(n)), where a(n) is the inverse Ackerman function. 
To do this we will show that for verification, each partition is examined 0{n) 
times and each examination requires amortized time 0{a{n)). As with insertions. 
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we maintain the elements in sorted order and this again involves merging three 
sorted lists. (The merge is minimally different; we now add instead of subtracting 
and subtract instead of adding.) 

We maintain a Union-Find data structure for each prefix of partitions 
■ ■ ■ Pxm ■ Thus, two elements i and j belong to the same set in the data 
structure Union-Find*^^'^) if they belong to the same set in all of the partitions 
. . .Pxf Because partitions are always inserted at the end of S, the sets in 
each Union-Find(“'^) data structure may coalesce when partitions are deleted 
but will never split apart. The sets can combine together at most n — 1 times 
before a single set remains and consequently all elements have the same color. 
The operation Find-Set^^^) [t] locates the smallest element belonging to the 
same set as element i (in the Union-Find^^*^^ data structure). The operation 
Union*^^'^^ [i , j] combines the set containing i and the set containing j (in the 
Union-Find^^'^) data structure). 

The algorithm appears in the full version of this paper. 

Theorem 3. Verifying an insertion or a deletion requires amortized time 
0(na(n)). 

If a verification identifies an error, then we run an alternative protocol. Be- 
cause the probability of an error is polynomially small, we obtain the following 
theorem. 

Theorem 4. Insertions and deletions run in amortized time 0(na(n)), both 
expected and w.h.p.. 

4 Estimating the Number of Partitions Separating 
Elements 

When building a decision tree (e.g., for OCR), one may want to maintain more 
detailed information besides the induced partition. In particular, for fault tol- 
erance, one may insist that each element be separated by at least k partitions. 
Thus, for each pair of elements the data structure could store and return the 
number of partitions that separate the elements. Unfortunately, the naive solu- 
tion to this problem requires 0{n^) for insert and delete. 

On the other hand, it may not be necessary to know the exact number of 
partitions that separate elements i and j. In applications such as building deci- 
sion trees, approximate knowledge of this number may be satisfactory. This is 
the problem we explore in this section. 

Our data structure supports the following three operations. 

• Insert(P ,S) - add a new partition P to the set of partitions S. 

• Delete(P,S) - delete existing partition p from the set of partitions S. 

• Query(i, j, S) - output an estimate of the number of partitions separating 
elements i and j. 
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We show that the operations insert and delete can be implemented to run in 
worst-case time 0(n log n), or expected 0{n) time. Query requires polylogarith- 
mic time and returns an answer that is accurate to within any constant factor. 
As it is stated, this algorithm works only for bipartitions. 

As in Section 0 we assign integers called colors to elements. However now 
the distance between two colors approximates the number of partitions sepa- 
rating elements. More specifically, each element i has /31ogn separate colors, 
C[i][l . . . /31ogn]. Each color is modified independently. 

Inserting a partition. A new partition is supplied as an array A[1 . . . n], where 
A[j] G {0, 1}. Note that each bipartition has two representations, where one 
representation is the complement of the other. To insert a partition, we inde- 
pendently modify each of the (3logn colors of the elements as follows. 

• Randomly choose one of the two representations of the partitions for each 
of the f3 log n, and 

• add these values to the composite colors of the elements. 

Thus in the i-th position some of the colors are incremented by 1 and some of 
the colors remain the same. 

Deleting a partition. To delete a partition, we again modify each of the /31ogn 
colors inversely to the changes on inserting the partition. For each color, we 
subtract the appropriate representations of the partition from the composite 
color so that each color either remains unmodified or decreases by 1. 

Querying elements i and j . To estimate the number of colors separating elements 
i and j, we compare the colors of i and j. Let Kt be the number of partitions in 
the system at time t, and let E{ ) be the expected value of C[z][Z]. Notice 
that, for all colors of all elements, the expected value of the color is ATt/2. 

However the actual colors will deviate from this expected value. If no sets 
separate i and j, then for all £ = 1 . . . /Jlogn, C[*][£] =C[j][£]. The less the value 
of C[i][£] and are correlated, the more sets separate i and j. Specifically, if 

there are d sets that separate elements i and j, then we can view the process of 
choosing colors for i and j as a random walk of length d in the following sense. 

Consider the basic random walk on the integer line. A walker starts at the 
origin and at each step t, moves one unit to the right with probability 1/2 and 
moves one unit to the left with probability 1/2. We compare this random walk 
with the dynamics of the algorithm. Whenever i and j are in different partitions 
than with probability 1/2, the color of i is incremented by 1 and the color of j 
stays the same, and with probability 1/2, the color of j is incremented by 1 and 
the color of i stays the same. 

Thus, the probability 



Pr[C[m-C[jW] = z] 

is exactly the probability that a random walk of length d ends at integer z. Thus, 
by examining the distribution of the colors of the elements, we can estimate the 
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most likely value of d. The estimation method appears in the full version of this 
paper. We obtain the following theorem. 

Theorem 5. For any error parameter e and constant c, there is a constant j3 
such that the following holds with probability 1 — l/n° (w.h.p.): If the number 
of sets separating elements i and j is d, then the estimate d' of d is bounded as 
follows: 

(1 — e)d < d' < {1 + e)d. 

To implement the insert and delete operations in expected 0(n log log n) 
time, we must show how to increment the /3 log n colors by the prescribed random 
bits in expected log log n time. This can be done by maintaining these colors as 
logfc sets of 2(3 integers, each of size logn. The first set of integers contains 
the least significant bit of (logn)/2 colors, with each data bit flanked by zero 
bits. Adding the random bits (similarly padded) to this integer increments each 
of the colors simultaneously. If any carries occur, they appear as 1 bits in the 
padded region, and require us to increment the appropriate next significant bits. 
To avoid bad situations, we use nonunique representation of numbers. Each 
round of incrementing takes constant time, and the expected number of levels 
to propagate the carries is log logn. 

5 Maintaining Geometric Set Partitions 

Suppose the elements in each set partition were points in the plane, and that 
each set partition was induced by a half-plane that distinguishes between the 
points which lie to the left or right of the defining line. Such partitions have 
been previously studied. For example, Freimer, et.al 0 prove it is NP-complete 
to find the smallest subset of lines sufficient to completely shatter a point set, 
i.e. induce a complete partition of the points. 

Clearly, the data structure problem can be solved by testing all of the half- 
planes against each point and reducing it to a non-geometric instance. However, 
exploiting the geometry can make the problem easier. The following operations 
should be supported by such a data structure. 

— Insert-line(l,S) - add a separating line I to the arrangement of partitions 
and points S. 

— Delete-line(l,S) - delete existing separating line I from the arrangement of 
partitions and points S. 

— Report(S) - report the set partition of U induced by the arrangement of 
partitions and points S. 

A naive way to support these operations is by maintaining the arrangement 
of the halfplanes. Any two points in the same cell of the arrangement represent 
unpartitioned elements. A line insertion/deletion into the arrangement takes 
0{k + n) time, the former term for inserting a line in an arrangement of k lines 
0 and the latter term to partition the lists of points in the split. And report takes 
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0 (n) time because we can maintain a list of non-empty cells in the arrangement 
with each cell maintaining a list of points in it. 

This naive algorithm is faster than the algorithms in the non-geometric set- 
ting when k is 0(n), but its performance degrades for larger k. An important 
drawback of this algorithm is that its not space efficient because it uses 0 {k'^) 
space to store the arrangement. We improve this naive algorithm so that in- 
sert and delete run in sublinear time. The algorithm uses spanning trees of low 
stabbing number and randomization. 

Theorem 6. Geometric set partitions can be maintained with report in 0{n) 
and insert/delete in 0{-\/nlogn) time. The data structure uses 0{n + k) space 
and 0(n^ ®logn) preprocessing time. The algorithm runs correctly in polynomial 
time w.h.p. 

Proof. First we preprocess the n points to obtain a spanning tree of low stab- 
bing number. We can find a spanning tree T of stabbing number of 0{y/n) in 
log n) time [311 ,'tlj . We orient each edge of T arbitrarily. In addition we 
maintain an integer or color for each edge, initially 0. 

To insert a line L we first associate an integer or color with L. As in the 
non-geometric randomized algorithms, the color is a randomly chosen integer 
between 1 and P—1, where P is 0{n‘^) for some constant c. Then we find the set 
E of 0{y/n) edges of tree T stabbed by L. For each edge e in if we either add 
or subtract the color of L modulo P from the color of e. We add if e goes from 
left to right of L and subtract otherwise. Finding the set E takes 0{y/nlogn) 
time |3|, and hence insertions take 0 {-\/nlogn) time. 

To delete a line L we find the set E of 0(\/n) edges of tree T stabbed by L. 
Then we subtract or add modulo P the color of L from the color of each edge e 
of the set E. We subtract if e goes from left to right of L and add otherwise. As 
for insertions, finding the set E and hence deletions take 0{^/nlogn) time. 

To report we start with any node of T and assign it an integer or color 0. 
We then traverse the tree and assign each node of T a color as follows. If a 
node has color cq and has an outgoing edge ei with color ci then the target of 
e gets a color cq -I- cimod (P). Similarly for an incoming edge 62 with color C 2 
the source node gets a color cq — cimod (P). After assigning each node with a 
color, we radix sort them by color and report the partition of nodes induced by 
their colors. Clearly, this can be performed in 0{n) time. 

Note that this works because for two points in the same cell of arrangement of 
lines, the unique path in T between them will intersect any line an even number 
of times. See Figure ^ Thus while moving from one point to another we would 
add and subtract colors of the lines an equal number of times and hence points 
in the same cell will have the same color. On the other hand, if two points are 
in different cells the unique path in T between them would intersect at least one 
line odd number of times. Hence w.h.p. the two points will have different colors. 

Details appear in the full paper. □ 

Although the off-line version of constructing the induced set partition of k set 
partitions of n elements can easily be solved in optimal 0 (kn) time by repeated 
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O Points 

Spanning tree 

^ Partitioning Lines 



Fig. 1. Monte Carlo algorithm for geometric set partitioning. 



application of Lemma H the geometric version of the off-line problem is more 
interesting. Here we are given a set of k lines and n points, and seek to determine 
the induced set partition of them. We obtain the following theorems. The proofs 
appear in the full version. 

Theorem 7. The induced set partition of k lines and n points in the plane can 
be determined in total ® -I- k^/n) time and 0{n + k) space w.h.p. 



Theorem 8. The induced set partition of k lines and n points in the plane can 
he determined in total 0{k^'^ log‘^ k + n'/k log^ k) time and O(fclog^fc) space, 
where to is a constant less than 4-3. 

6 Maintaining Sorted Strings under Character 
Insertion/Deletion 

The problem of determining the partition induced by a collection of k set parti- 
tions on {1, . . . , n} can easily be reduced to that of sorting strings. Arbitrarily as- 
sign each of the parts of each partition p a distinct number from 0 to parts{p) — 1. 
Define Si to be the string of parts associated with element i, i.e. Si[j] = g iff 
element i is in part q of set partition j. Elements x and y are indistinguishable iff 
Sx = Sy. Hence sorting strings {^i, . . . , Sn} groups the elements into blocks of 
equivalence classes. This gives us an alternate, more general formulation which 
yields our original problem as a special case - maintain the sorted order of strings 
where we are (1) allowed to delete the ith character from each string and (2) 
append an extra character to each string. 

In general, we have the problem of maintaining a set of strings S = {S'!, . . . , 
Sn} under the following operations: 

• Report( S) - Return the permutation of S representing the the sorted order 
of the strings, with runs of duplicates identified. 

• Insert(S,i, T) - Insert character T[j] after the ith position of each string Sj, 
^ ^ j ^ n. This increases the length of each string in S by one character. 

• Delete(S,i) - Delete the ith character of each string Sj, 1 < j < n. This 
decreases the length of each string in S by one character. 
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A data structure to implement these operations efficiently would yield a data 
structure for maintaining dynamic set partitions as a special case. We propose 
a series of data structures based on suffix trees m to efficiently support a re- 
stricted set of these operations. In particular, we build our data structure around 
Ukkonen’s linear time suffix tree construction algorithm m- In Ukkonen’s algo- 
rithm, suffixes are inserted into the tree from left to right. Analogously, we can 
continue to append new characters onto the end of a string by simulating the 
insertion of another subsequent suffix. 

We will augment this suffix tree to support constant-time least common an- 
cestor queries. Cole and Hariharan ^ demonstrate how to maintain constant- 
time least common ancestor queries in trees supporting leaf insertion/deletion 
and edge-splitting updates. 

Theorem 9. A data structure can support the operations of head deletion, tail 
insertion, and sorted-order reporting in times 0(n), amortized 0(n), and 0{n + 
k) respectively. 

The proof of Theorem 0 appears in the full version of this paper. The data 
structure of Theorem 0 can be used to efficiently obtain the shortest contigu- 
ous discriminating run of k preordered tests, which suggests a new heuristic 
approach to finding small sets of discriminating features. We repeatedly insert 
new features in the given order until they first suffice to completely discriminate 
the n strings. At this point, we repeatedly delete the prefix characters of each 
string, until the refined partition contains fewer than n parts. By interleaving 
these phases of insertion and deletion, and maintaining the boundaries of the 
shortest discriminating run encountered along the way: 

Corollary 1. The shortest contiguous discriminating run of k ordered tests can 
he computed in 0{k{n + k)) time. 

The presence of arbitrary deletions significantly complicates the problem of 
maintaining sorted order. However, we have reasonable results when the number 
of deletions is small: 

Theorem 10. A data structure can support the operations of arbitrary deletion, 
tail insertion, and sorted-order reporting in times 0{n), (amortized) 0{n), and 
0{dn\gn) respectively, where d is the total number of deletions which have been 
performed. 
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Abstract. We present an algorithm for computing the domination num- 
ber of a planar graph that uses O(c'^n) time, where k is the domination 
number of the given planar input graph and c = 3®'^. To obtain this 
result, we show that the treewidth of a planar graph with domination 
number k is 0{Vk), and that such a tree decomposition can be found in 
0{Vkn) time. The same technique can be used to show that the disk 
DIMENSION problem (find a minimum set of faces that cover all vertices 
of a given plane graph) can be solved in 0{cf^n) time for ci = 2®'^. 
Similar results can be obtained for some variants of dominating set, 
e.g., INDEPENDENT DOMINATING SET. 



1 Introduction 



A k-dominating set D of an undirected graph G is a set of k vertices of G such 
that each of the rest of the vertices has at least one neighbor in D. A minimal k 
such that the graph G has a fc-dominating set is called the domination number 
of G. 

The fc-DOMiNATiNG SET problem, i.e., the task to decide, given a graph G = 
(V, E) and a positive integer k, whether or not there exists a ^-dominating 
set, is among the core problems in algorithms, combinatorial optimization, and 
computational complexity [ I I f 1 61 1 DI24j . The problem is NP-complete, even when 
restricted to planar graphs with maximum vertex degree 3 and to planar graphs 
that are regular of degree 4 nn|. 

The approximability of the dominating set problem has received consider- 
able attention cnEi. It is not known and is not believed that dominating set 
for general graphs has a constant factor approximation algorithm (see Crescenzi 
and Kann m for details). However, the planar dominating set problem (i.e., 
the dominating set problem restricted to planar graphs) possesses a polynomial 
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time approximation scheme That is, there is a polynomial time approxima- 
tion algorithm with approximation factor 1 -I- e, where e is a constant arbitrarily 
close to 0. However, the degree of the polynomial grows with 1/e. Hence, apply- 
ing the approximation scheme does not always lead to practical solutions and 
finding an “efficient” exact algorithm for planar dominating set is therefore 
of interest. 

Due to the hardness and relevance of dominating set, numerous papers 
have studied special cases of dominating set, e.g., connected dominating set, 
total dominating set, independent dominating set, dominating clique, and/or the 
complexity of the problem in special graph classes |til8ll()ll5ll7ll8l22l2ti| . For 
example, a very recent result shows that there is a factor 2 -|- e approximation 
algorithm for dominating set on the class of circle graphs ini. 

Lately, it has become popular to cope with computational intractability in 
a different way besides approximation: parameterized complexity d Here, the 
basic observation is that, for many hard problems, the seemingly inherent combi- 
natorial explosion can be restricted to a “small part” of the input, the parameter. 
For instance, the vertex cover problem can be solved by an algorithm with 
running time 0{kn + 1.3^) |9I23| . where the parameter fc is a bound on the 
maximum size of the vertex cover set we are looking for and n is the number 
of vertices in the given graph. The fundamental assumption is fc <C n. As can 
easily be seen, this yields an efficient, practical algorithm for small values of k. A 
problem is called fixed parameter tractable if it can be solved in time f{k)n^^^'^ 
for an arbitrary function / which depends only on k. Unfortunately, according to 
the theory of parameterized complexity it is very unlikely that the dominating 
SET problem is fixed parameter tractable. On the contrary, it was proven to be 
complete for VF[2], a “complexity class of parameterized intractability” (refer to 
Downey and Fellows d for any details). However, planar fc-DOMiNATiNG SET 
is fixed parameter tractable. Downey and Fellows fTHnj state an O(ll^n) time 
bound for this problem, where n is the number of vertices. 

In this paper, we present a drastic asymptotic improvement of this result. 
We show that planar dominating set can be solved in time 0{c^n) for 
some constant c. To the best of our knowledge, this is the first fixed parameter 
tractability result where the exponent of the exponential term is not growing 
linearly, but with the square root of the parameter. We show that a graph with 
a dominating set of size k has treewidth 0{\fk), and we use this to solve pla- 
nar DOMINATING SET using the corresponding tree decomposition of the graph. 
Unfortunately, the constant base c of the exponential term that appears in the 
running time of our algorithm still is quite large, namely c = 3®''^. However, 
the authors are confident that a more refined analysis of the applied techniques 
can improve this constant considerably. 

Our technique can also be used to significantly improve a known bound for 
the DISK DIMENSION problem [2125) . The problem is defined as follows |2l25j : 
Given a plane graph G, i.e., a graph with a fixed embedding in the plane and 
a positive integer fc, is there a set of at most k faces (disks), such that all of 
the graph vertices are covered? The problem is NP-complete | 2 |. Downey and 
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Fellows m gave an 0(12^n) algorithm for this problem. For a slightly more 
general version of the problem, Bienstock and Monma 0 showed that there is a 
time 0{c^n) algorithm, where c is an unspecified constant. In this paper, we give 
an algorithm that solves disk dimension in time O^cf^n) for some constant ci. 
We also discuss some variants of the dominating set problem. 



2 Preliminaries 

In this section, we provide necessary notions and some known results. We assume 
familiarity with basic graph-theoretical notation. 

Definition 1 A graph G is outerplanar if there is a crossing-free embedding of 
G in the plane such that all vertices are on the same face. 



Definition 2 A graph G is r-outerplanar if, for r = 1, G is outerplanar or, for 
r > 1, G has a planar embedding such that if all vertices on the exterior face 
(which form the exterior layer Li) are deleted, the connected components of the 
remaining graph are all at most (r — l)-outerplanar. 

In this way, we may speak of the layers Li, . . . , of an r-outerplanar graph. 
One easily makes the following central observation: 

Proposition 1. If a planar graph G = (V, E) has a k-dominating set, then it 
can he at most Sk- outerplanar. 

The main tool we use in our algorithm is a suitable tree decomposition: 

Definition 3 Let G = (V, E) be a graph. A tree decomposition of G is a pair 
{{Xi I i G /}, r), where each W is a subset of V and T is a tree with the elements 
of I as nodes. The following three properties should hold: 

• = 

• for every edge {u, r} G E, there is an i G / such that {u, r} C Xp, 

• for all i, j, k G I, ii j lies on the path between i and k in T, then XiC\Xk Q Xj . 

The width of ({W | i G I},T) equals max{|Xi| | i G /} — 1. The treewidth of G 
is the minimal k such that G has a tree decomposition of width k. 

In Table 2, page 550] or 0 Theorem 83], we can find: 

Proposition 2. An r-outerplanar graph has treewidth of at most 3r — 1. 

Propositions [D and El imply that a graph with domination number k has 
bounded treewidth, or, more precisely, its treewidth is bounded by 9k — 1, but 
we will give a stronger bound later. 
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Theorem 4. If a tree deeomposition of width at most i of a graph is known, 
then a minimum dominating set can he determined in time 3^n, where n is the 
number of nodes of the tree decomposition. 

Comments on the proof: The theorem can be proved by using dynamic pro- 
gramming techniques. For each bag (i.e., each set Xi of the corresponding tree 
decomposition) one keeps a table. These tables store, for every vertex in the bag, 
the information of whether that vertex is assumed to belong to either the dom- 
inating set, the (known) set of dominated vertices, or the set of vertices whose 
status is unknown at the given point. Since 1X^1 < £, the table size for each bag 
is bounded by 3^. See e.g., j4l2Yj . 

In this way, a straightforward solution to the PLANAR dominating set prob- 
lem using tree decompositions leads to an algorithm which runs in time 0(3®^n). 
(For a graph G = (V, E), there always is a tree decomposition with optimal width 
and with at most \V\ nodes.) Downey and Fellows ji;-ill4j suggested an idea that 
leads to a faster search tree algorithm. They state an algorithm with running 
time O(ll^n) (without using tree decompositions). 

In what follows, we show that a graph with a /c-dominating set has tree- 
width 0(\/fc). Combining this with Theorem E| gives a significant asymptotic 
improvement of the result of Downey and Fellows. 



To understand the following technique, it is helpful to consider the concept 
of a layer decomposition of an r-outerplanar graph G. It is a forest of height r 
which is defined as follows: the nodes of the trees are sets of vertices of G and 
the different trees correspond to different components of G. In general, the ith 
layer of the layer decomposition forest defines a set of vertices Li, namely the 
fth layer of G. 

Consider now the fth layer of the forest, i.e., the nodes of level i in the 
decomposition forest, consisting of, possibly, several vertex sets . . . , 

In other words, Li = U/=i The vertex sets Qq, . . . , Ci^^. correspond to the 
vertices of different components of the subgraph induced by Li. We refer to Gij 
as a layer-component . In particular, the first layer consists of layer-components 
each of which equals the vertices from L\ of one particular component. 

A layer-component Gij of layer Li is called non-empty if it contains vertices 
from layer in its interior. 

Definition 5 Let 0 C C Gij be a subset of a non-empty layer-component 
Gij of layer i, where i>2. Then the unique cycle B(C) in layer Li-\, such that 
C is contained in the region enclosed by B{C) and no other vertex of layer Li_i 
is contained in this region, is called the boundary cycle of G. 

The existence and uniqueness of such a boundary cycle B{G) is easy to see. 
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3 Domination versus Treewidth 

Our algorithm is based on Theorem^ Therefore, in the following we show that a 
planar graph with domination number k has treewidth of at most 0{f{k)), where 
f{k) is a sublinear function, which we are going to determine. Here, the main 
idea is to find small separators of the graph and merge the tree decompositions 
of the resulting subgraphs. To this end, the following observation is used. 

Proposition 3. If a connected graph can be decomposed into components of 
treewidth of at most t by means of a separator of size s, then the whole graph 
has treewidth of at most t + s. 

The proof is quite simple: Just merge the separator to every node in each tree 
decomposition of width at most t which correspond to the distinct components. 
Then add some arbitrary connections between the trees corresponding to the 
components in order to form a tree decomposition of the whole graph. 



For planar graphs, there is an iterated version of this observation. 

Proposition 4. Let G be a planar graph with layers Li, (i = 1, . . . , r). For i = 
1, . . . let Ci be a set of consecutive layers, i.e. Li = {Lj. , 
such that CidCi' = 0 for all i ^ i! . Moreover, suppose G can be decomposed into 
components, each of treewidth of at most t, by means of separators Si, . . . , St, 
where Si C L for all i = 1, . . . ,1. Then G has treewidth of at most t + 2s, 

where s = maxj^i^,..^^ jiSil. 

The proof again uses the merging-techniques illustrated in the previous proposi- 
tion: Suppose, w.l.o.g., the sets Li appear in successive order, i.e. ji < ji+i- For 
each i = 0, . . . ,1, consider the component Gi of treewidth at most t which is cut 
out by the separators Si and S'^+i (by default we set Sq = S'^+i = 0). We add Si 
and 5^+1 to every node in a given tree decomposition of Gi. In order to obtain 
a tree decomposition of G, we successively add an arbitrary connection between 
the trees Ti and of the so-modified tree decompositions that correspond to 
the subgraphs Gi and G^+i. 

Finally, we still have to show how to construct (in polynomial time) a tree 
decomposition of width f{k) matching our theoretical treewidth bound. This 
allows us to apply Theorem 0 to actually determine the dominating set we are 
aiming at. 

The whole algorithm we present has time complexity 0(3-^^*^n). Since f{k) € 
0(\/k), this obviously gives an asymptotic improvement of the O(ll^n) algo- 
rithm presented by Downey and Fellows. 

In the following, we assume that our graph has a fixed plane embedding with 
r layers. We show that the treewidth cannot exceed f(k) if a dominating set of 
size k is given. 
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Fig. 1. upper triples 

' ' . X e -Di 



Fig. 2. lower triples 





S(Ci+lj ) C Li 



Case 2: 



Fig. 3. middle triples 



3.1 Separators and Treewidth 

We assume that we have a dominating set D of size at most k. Let ti be the 
number of vertices oi Di = D D Li. Hence, X)i=i order to avoid case 

distinctions, we set to = t^+i = tr +2 = 0. Moreover, let Ci denote the number of 
non-empty layer-components of layer Li. 

We need some definitions for certain triples in the plane graph. These triples 
are defined in a way such that the union of these triples will yield separators of 
small size. 

We define the triples for a layer Li. The union of these triples separates ver- 
tices of layer from vertices of layer Li+ 2 - For this purpose, in the following, 
we write N{x) to describe the set of neighbors of a vertex x and use the notion 
B(-) for boundary cycles as introduced in Definition 0 

Definition 6 An upper triple for layer Li is associated to a non-empty layer- 
component Ci^ij of layer and a vertex x G Di-\ that has a neighbor 

on the boundary cycle B{Ci+\^) (see Fig. [Q. Then, clearly, x G B(B(Ci+ij)), 
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by definition of a boundary cycle. Let x\ and X 2 be the neighbors of x on the 
cycle B{B{Ci+ij)). Starting from xi, we go around x up to X 2 so that we visit 
all neighbors of x in layer Li. We note the neighbors of x on the boundary 
cycle B{Ci+ij). Going around gives two outermost neighbors y and z on this 
boundary cycle. The triple then is the three-element set {x^y,z}. In case x has 
only a single neighbor y in the “triple” consists of only {x,y}. 

For each non-empty layer-component of and each vertex x G 

with neighbors in B{Ci+ij), we obtain such an upper triple. 



Definition 7 A lower triple for layer Li is associated to a vertex x G Di+i 
and a non-empty layer-component Ci+ij/ of layer (see Fig. |21). Suppose 

X lies in layer-component Ci+ij . We only consider layer-components Ci+\j' of 
layer that are enclosed by the boundary cycle B{Ci+\j). For each pair 

y,z £ B{Ci+ij) n N{x) (where y ^ z), we consider the path Py^i from y to 
z along the cycle i?(Ci+ij), taking the direction such that the region enclosed 
by {z,a;}, and contains the layer-component Q+ij'. Let {y,z} C 

S(C'i+ij) n N{x) be the pair such that the corresponding path Py ^ is shortest. 
The triple, then, is the three-element set {x,y, z}. If x has no or only a single 
neighbor y in B(Ci+i^j), then the “triple” consists only of {x}, or {x,y}. 

For each vertex x G Ci+\j of Di+i and each non-empty layer-component Ci+ij' 
that is enclosed by i?(Ci+ij), we obtain such a lower triple. 



Definition 8 A middle triple for layer Li is associated to a non-empty layer- 
component Ci+ij and a vertex x £ Di that has a neighbor in B(Ci+ij) (see 
Fig. OJ. Note that, due to the layer model, it is easy to see that a vertex x £ Di 
can have at most two neighbors y, z in B(C'i+ij). Depending on whether x itself 
lies on the cycle B{Ci+ij) or not, we obtain two different cases which both are 
illustrated in Fig. 0 In either of these cases the middle triple is defined as the 
set {x, y, z}. Again, if x has none or only a single neighbor y in B{Ci+ij), then 
the “triple” consists only of {x}, or {x,y}. 

For each non-empty layer-component Ci^ij and each vertex x G Di, we obtain 
such a middle triple. 



Definition 9 We define the set Si as the union of all upper triples, lower triples 
and middle triples of Li. 

In the following, we will show that Si is a separator of the graph. Note that 
the upper bounds on the size of Si, which are derived afterwards, are crucial for 
the upper bound on the treewidth derived later on. 

Theorem 10. The set Si separates vertices of Li_i and Li+ 2 - 
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Fig. 4. Si separates Li-i and Li+2- 



Proof. Suppose there is a path P (with no repeated vertices) from layer Li^2 
to layer Li_i that avoids Si. This clearly implies that there exists a path P' 
from a vertex a; in a non-empty layer-component j of layer to a vertex 
z G S(B(Q+i. j)) in layer Ti_i which has the following two properties: 

• P'f^Si = %. 

• All vertices in between x and z belong to layer Li or to empty layer- 
components of layer Ti+i. 

(This can be achieved by simply taking a suitable subpath P' of P.) Let yi (and 
1/2, respectively) be the first (last) vertex along the path P' from a; to z that lies 
on the boundary cycle B{Ci+ij) C Li (see Fig. 0 . 

Obviously, j/2 cannot be an element of D, since, then, it would appear in 
a middle triple of layer Li and, hence, in Si. We now consider the vertex that 
dominates 1/2. This vertex can lie in layer Li-\, Li or Li+i. 

Suppose first that j/2 is dominated by a vertex di G Li-\. Then di is in 
S(i 3 (( 7 i+i,j)), simply by definition of the boundary cycle (see Fig. 0 . Since G 
is planar, this implies that 1/2 must be an “outermost” neighbor of di among all 
elements in N{di) D B{Ci+ij). If this were not the case, then there would be an 
edge from di to a vertex on B{Ci+ij) that leaves the closed region bounded by 
{di,j/2}, the path from j/2 to z, and the corresponding path from z to di along 
B{B{Ci+ij)). Hence, j/2 is in the upper triple of layer Li which is associated to 
the layer-component Ci+ij and d±. This contradicts the fact that P' avoids Si. 

Now, suppose that j/2 is dominated by a vertex ^2 G Di (see Fig. By 
definition of the middle triples, this clearly implies that j/2 is in the middle 
triple associated to Ci+ij and d2. Again, this contradicts the assumption that 

p'ns^ = <h. 
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Consequently, the dominating vertex ds of 2/2 has to lie in layer Aj+i. Let 
{^3, c?3, d|}, where d\,dl S Nid^) fl be the lower triple associated to 

layer-component Ci+ij and d^ (see Fig. By definition, Ci+ij is contained 
in the region enclosed by {dg, ds}, {ds, d§} and the path from d| to dg along 
B{Ci+ij), which — assuming that 2/2 ^ {d3,d3,d§} — does not hit 2/2 (see Fig.0. 
We now observe that, whenever the path from 2/1 to 2/2 leaves the cycle B{Ci+ij) 
to its exterior, say at a vertex q, then it has to return to B{Ci^ij) at a vertex 
q' G N{q) fl B{Ci+ij). This, however, shows that the path P' has to hit either 
d\ or d| on its way from 2/1 to 2/2- Since d|,d§ G Si, this case also contradicts 
the fact that P' Si = %. □ 

Lemma 1. |5i| < 5(<i_i +U + U+i) + 12ci+i. 

Proof. We give bounds for the number of vertices in upper, middle and lower 
triples of layer i, separately. 

Firstly, we discuss the upper triples of layer i, which were associated to a 
non-empty layer-component Ci+ij of layer and a vertex x G Di-\ with 

neighbors in B(C'i+ij). Consider the bipartite graph G' which has vertices for 
each non-empty layer-component and for each vertex in Di-\. Whenever 

a vertex in Di-i has a neighbor in B(C'i+ij), an edge is drawn between the cor- 
responding vertices in G' . Each edge in G' , by construction, may correspond to 
an upper triple of layer Li. Note that G' is a planar bipartite graph whose bipar- 
tition subsets consist of ti_i and Ci_|_i vertices, respectively. Thus, the number of 
edges of G' is linear in the number of vertices; more precisely, it is bounded by 
2(fi_i -|- Ci+i). From this, we obtain an upper bound for the number of vertices 
in upper triples of layer Li as follows: Potentially, each vertex of appears in 
an upper triple and, for each edge in G' , we possibly obtain two further vertices 
in an upper triple. This shows that the total number of vertices in upper triples 
is bounded by ti_i + 4(ti_i -|- c^+i). 

A similar analysis can be used to show that the number of vertices in the 
lower triples is bounded by -I- 4(ti_|_i -|- Cj+i) and that the number of vertices 
in the middle triples can be bounded by L + A(ti + Ci+i). 

By definition of Si, this proves our claim. □ 

Note that, by a more detailed investigation, the bound given in Lemma ^ 
probably can be improved. One observes, e.g., that the planar bipartite graph 
G' , which was constructed in the proof, has the special property that it is a 
“hyperplane” bipartite graph, i.e., one of the bipartition subsets can be arranged 
on a line and all edges of the graph lie in one halfplane of this line. This property 
of G' is immediate from the fact that the upper triples associated to a non-empty 
layer-component Oi+ij- lie within the boundary cycle B{B{Ci+i^j)). For such 
graphs, first investigations indicate that one can obtain better estimates on the 
number of their edges than the ones used in the proof of Lemma E 

A similar observation can be made for estimating the bounds for the lower 
triples. 

Lemma 2. a<ti + t^+i -I- U+ 2 - 
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Proof. By definition, Ci refers to only non-empty layer-components in layer Lj, 
i.e., there is at least one vertex of layer Li+i contained within each such layer- 
component. Such a vertex can only be dominated by a vertex from layer Li, 
Li+i, or Li+2- In this way, we get the claimed upper bound. □ 



Lemma 3. I'S'il < 51/c, where r is the number of layers of the graph. 

Proof. This follows directly when we combine the previous two lemmas. □ 

Consider the following three sets of vertices: §o = S'! U 54 U S'7 U . . . , Si = 
52 U 5s U 5s U . . . and S2 = 5a U 5e U 5g U . . . . As |5i| -I- 152| -I- 153| < 51fc, one 
of these sets has size at most ^k, say Ss (with S S {0, 1,2}). 

Theorem 11. A planar graph with domination number k has treewidth of at 
most 6\/34\/fc. 

Proof. Let 6 and S^ be as obtained above. Let d := |\/34- We now go through 
the sequence 5i+5, 54+5, 57+5 , . . . and look for separators of size at most s{k) := 
dVk. Due to the estimate on the size of S5, such separators of size at most s{k) 
must appear within each n{k) := ^d~^V~k = I\/34\/fc sets in the sequence. In 
this manner, we obtain a set of disjoint separators of size at most s{k) each, such 
that any two consecutive separators from this set are at most Sn{k) layers apart. 
Clearly, the separators chosen in this way fulfil the requirements in Proposition El 
Observe that the components cut out in this way each have at most 3n(/c) 
layers and, hence, their treewidth is bounded by 9n{k) due to Proposition El 
Using Proposition 0] we can compute an upper bound of the treewidth tw of 
the originally given graph with domination number k: 

tw(fc) < 2s{k) + 9n{k) 

= 2{-\/^V~k) + 9{-\/M\/~k) 

Zi o 

= 6\/34\/fc. 



This proves our claim. □ 

Observe that the tree structure of the tree decomposition obtained in the 
preceding proof corresponds to the structure of the layer decomposition forest. 

How did we come to the constants? We simply computed the minimum of 
2s{k) + 9n{k) (the upper bound on the treewidth) given the bound s{k)n{k) < 
^k. This suggests s{k) = d'/k, and d is optimal when 2s{k) = 9n{k) = 9 • ^ • 
k ■ s{k)~^, so, 2d = i.e., d = |\/34. 

As already mentioned above, it seems to be possible to improve upon the 
bound of the treewidth by a more refined analysis. 
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3.2 Tree Decomposition 

The proofs above can be turned into constructive algorithms that find tree de- 
compositions of the stated widths. From the proof in that an r-outerplanar 
graph has treewidth at most 3r — 1, one can construct a linear time algorithm 
that indeed finds a tree decomposition of width 3r — 1 of a given r-outerplanar 
graph. The proofs in this paper can also be made constructive, but there is one 
point that needs specific attention. As we do not start with the dominating set 
given, we cannot construct the upper, middle, and lower triples. Instead, we 
compute the minimum separator between and Ti +2 directly, and use that 
set instead of Si as defined in the proof of Section tl.lL Such a minimum separator 
can be computed with well known techniques based on maximum flow (see e.g., 
m)- The running time to find one such separator is 0{sn'), where s is the size 
of the separator, and n' the number of vertices that are involved. The total time 
to find all separators, stopping when separators become so large that they will 
not be used further in the algorithm, can be bounded by 0{Vkn). 

Theorem 12. The planar dominating set problem can be solved in time 
O(c'^n), where k is the domination number of the given graph of size n, and 
c=36^. 

Proof. A tree decomposition of width of G can be constructed in 

0{Vkn) time. (If k is not known in advance, then an 0{Vkn) time algorithm is 
still possible for this step, using detailed bookkeeping techniques. Otherwise, one 
can try different values oi k — this can be done at the cost of an extra multiplica- 
tive factor of 0(log k) by using binary search.) Then, this tree decomposition can 
be used to solve the dominating set problem, as described in Theorem 0 □ 

The constant c above is 3®^^, which is rather large. However, a more refined 
analysis will help to reduce this constant significantly. Moreover, it is a worst 
case estimate, which might be far from what happens in practical applications. 



4 Variations of dominating set and disk dimension 

For several variations of the dominating set problem, our technique can also 
help to obtain algorithms with a similar running time. In particular, we have 
the following. Let dominating set with property P be the following graph 
problem: Given a graph G = (V, E), find the minimum size set W C V with W 
a dominating set and where property P{W) holds. 

Theorem 13. Suppose there is an algorithm that solves in 0{q^ ■ n) time the 
DOMINATING SET WITH PROPERTY P problem on graphs, given a tree decomposi- 
tion with treewidth I and n nodes for some constant q. Then the dominating set 
WITH PROPERTY P problem can be solved in 0{q‘^^ ■ n) time on planar graphs, 
where k is the minimum size dominating set with property P and d = 6-\/34. 
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Proof. If the planar graph G admits a dominating set with property P of size 
at most k, then, clearly, G has domination number at most k. By Theorem I1 1 1 
the treewidth of G is bounded by 6^/M^/k. According to the discussion in Sec- 
tion E21 a corresponding tree decomposition can be found in time 0{Vkn). The 
assumption on the existence of an 0{q^ -n) time algorithm for given tree decom- 
position of width i then yields the claim. □ 

Problems for which the condition of Theorem holds and, hence, for which 
we can find such an 0{c^-n) time algorithm are, for instance, the independent 
DOMINATING SET problem, TOTAL DOMINATING SET problem. Or CONNECTED 
DOMINATING SET problem. 

We now turn our attention to the disk dimension problem (see [21 1 4WZ!n\ ) 
which is the following: Given a plane graph G = (V, E) (i.e., a planar graph 
with a fixed embedding), find the minimum set of faces that cover all vertices 
of G. We can use the techniques established for solving dominating set with 
PROPERTY P on planar graphs to solve the disk dimension problem: 

Let G = (y, E) be a plane graph. Consider the following graph: Add a vertex 
to each face of G, and make each such “face vertex” adjacent to all vertices that 
are on the boundary of that face. Let G' = (P', if') be the resulting graph. Write 
V' = V VJVp, where Vp is the set of vertices that represent a face in G. 

For W C V' , we define P'{W) = true if and only if W C Vp- Then, by 
construction, there is a one-to-one correspondence between the sets of faces that 
cover the vertices of G and dominating sets in G' with property P' . In this sense, 
the DISK DIMENSION problem can be transformed to the dominating set with 
PROPERTY P' problem in linear time. 

Theorem 14. The disk dimension problem can he solved in time O(c^n), 
where k is the disk dimension of the given graph of size n, and ci = 2®''^. 

Proof. Consider the graph G' = iV' , E') with P' = P UPp as given above. Given 
a tree decomposition of width £, the dominating set problem with property P' 
can be solved in time 0(2^ • n), similar to the dynamic programming algorithm 
sketched in the proof of Theorem^ Observe that the size of the tables we have to 
use for each bag are smaller than for the general dominating set problem, since 
each vertex of Vp is either in the dominating set or not and each vertex of P is 
either dominated or not. This gives table size 2^. Theorem El and the one-to-one 
correspondence between this problem and the disk dimension problem yield the 
claim. □ 

We remark that the problem dominating set with property P' as defined 
above is, in a bipartite variant, bascially called planar red/blue dominating 
SET in m p.38]. There, Downey and Fellows derive an 0(12^n) algorithm for 
this problem. In the same place, they give an 0(12^n) algorithm for disk di- 
mension, which they call face cover number for planar graphs. Hence, 
our observations lead to asymptotic improvements of their results. 
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5 Conclusion 

In this paper, we presented a treewidth-based approach to improve the fixed 
parameter complexity of the planar dominating set and the disk dimension 
problem drastically — we gained an exponential improvement over previous exact 
solutions for the problems. Seemingly for the first time, our results provide fixed 
parameter algorithms whose exponential factor has an exponent sublinear in the 
parameter. 

In the long version of this paper, we plan to give improved estimates for 
the constant bases of the exponential terms. In addition, it would be interesting 
to investigate the practical usefulness of our result, since our estimates for the 
constants are worst case and very pessimistic ones. It also is interesting to see 
if these results can be extended to more variants of Dominating Set and to 
other graph classes (e.g., graphs of bounded genus). Another interesting open 
problem is how to use the techniques of this paper for the variant of the disk 
DIMENSION problem, where the embedding is not given as ab input (i.e., for a 
given planar graph, find an embedding with minimum number of faces that cover 
all the vertices). 

Finally, we remark that similar results on planar dominating set and 
related problems can be obtained by making use of the small separator techniques 
presented in this paper together with the algorithms for outerplanarity-bounded 
graphs developed by Baker which would also yield running times of the form 
O(c'^n) for some constant c, where k is the domination number of the given 
graph. 
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Abstract. We present 0(n®) embedding algorithms (generalizing sub- 
graph isomorphism) for classes of graphs of bounded pathwidth, where n 
is the number of vertices in the graph. These include the first polynomial- 
time algorithm for minor containment and the first 0(n'^) algorithm (c 
a constant independent of k) for topological embedding of graphs from 
subclasses of partial fe-trees. Of independent interest are structural prop- 
erties of fc-connected graphs of bounded pathwidth on which our algo- 
rithms are based. We also describe special cases which reduce to various 
generalizations of string matching, permitting more efficient solutions. 

1 Introduction 

Many fundamental problems in a diverse set of research areas can be charac- 
terized as graph embedding problems, where data is represented as graphs and 
patterns can be detected by finding smaller graphs in larger ones. Classic pattern- 
matching problems make use of the subgraph isomorphism problem, namely, the 
problem of determining whether there is a subgraph of an input graph H that is 
isomorphic to an input graph G. Viewed as an injective mapping, the subgraph 
isomorphism of G into H consists of a mapping of vertices of G to vertices of 
H so that edges of G map to corresponding edges of H. Generalizations of this 
mapping include topological embedding, where vertices of G map to vertices of 
H and edges of G map to vertex-disjoint paths in i?, and minor containment, 
where vertices of G map to disjoint connected subgraphs of H and edges of G 
map to edges of H. 

Subgraph isomorphism (and therefore its generalizations listed above) is 
known to be AfP-complete for general graphs, but can be solved in polyno- 
mial time for many restricted classes of graphs. Of particular interest are partial 
A:-trees, also known as graphs of bounded treewidth (to be defined formally 
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in Section 0, algorithms for which unify many of the known polynomial-time 
algorithms for embedding problems. The embedding problems are also J\fV- 
complete for general partial fc-trees (implied by |Sys82| ), even under many dif- 
ferent connectivity and degree bounds on both graphs mm- However, when 
both G and H are ^-connected partial fc-trees, there are polynomial-time algo- 
rithms for both subgraph isomorphism and topological embed- 

ding IGN!)4IGN5&I but minor containment remains AfP-complete even for this 
restricted class mimi . 

The state of our knowledge about these problems is unsatisfying in a number 
of ways. The degree of the polynomial in the complexity of the algorithms for 
subgraph isomorphism and topological embedding depends on the magnitude 
of k, (e.g., 0(n^ -i-fc-i-5.5^ topological embedding). This raises the ques- 

tion of whether there is an algorithm that runs in time 0{n‘^) for c a constant 
independent of k. (Such an algorithm would be unlikely if the problems were 
fixed-parameter intractable mm) . Furthermore, although polynomial-time al- 
gorithms for minor containment have been obtained when there is a degree 
bound IMTf)2KINil,^l . there are no previous results relating connectivity con- 
straints and polynomial-time minor containment algorithms. 

Our contributions in this paper demonstrate that for large subclasses of 
graphs of bounded pathwidth (a restriction on partial fc-trees), there exist 0(n‘^) 
algorithms for minor containment, topological embedding, and subgraph iso- 
morphism. The algorithms make use of a new and elegant characterization of 
/c-connected graphs of bounded pathwidth (Section 0 which allows us to form 
a common framework for the algorithms (Sections 01 and 0. We show that each 
such graph has an essentially unique layout of the vertices on k “tracks” . These 
layouts, and the restrictions they imply on the structure of topological embed- 
dings and minor containments, allow the description of intuitive algorithms with 
elegant proofs of correctness. In special cases (Section 0 we can exploit further 
structure to reduce the problems to string matching and its variants, permitting 
more efficient solutions. In this conference version, we omit many details; proofs 
of the most important and complex theorems will be briefly discussed in the 
text. 



2 Preliminaries 

2.1 Graphs, Treewidth, and Pathwidth 

Throughout this paper we use standard graph-theoretic notation jHMTnj . The 
vertex and edge sets of a graph G are denoted by V{G) and E{G) respectively; 
we use n to denote |I^(G)|. All graphs we consider are simple and without self- 
loops. The set of vertices adjacent to a vertex v, the neighbourhood of v, is 
denoted by N{v). A graph is k-connected if there are k vertex-disjoint paths 
between every pair of its vertices. Menger’s Theorem states that any separator 
of a fc-connected graph (a set of vertices whose removal disconnects the graph) 
contains at least k vertices. 
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In this paper we deal with subclasses of graphs of bounded treewidth, as 
defined below. 

Definition 1. A tree decomposition of a graph G is a pair (T, x) where T is a 
tree and x ■ y{T) — >■ satisfies the following three properties: (1) for every 

a € V{G), there is an x ^ y{T) such that a S x(cc); (2) for every e = {a,b) G 
E{G), there is an x gV (T) such that a,b G xi^)i (3) for all x,y, z G V (T), 
if y is on the path from x to z in T then xi^) ^ x{v)- 

The width of a tree decomposition (T, x) is max{|x(a;)| — 1 : a; £ V{T)}. The 
treewidth of a graph G is the minimum width over all its tree decompositions; 
a graph of bounded treewidth is a graph of treewidth k for some constant k. For 
p a vertex of T, x{p) is called a bag of T. A path decomposition of a graph is a 
tree decomposition in which T is a path, and the pathwidth of G is the minimum 
width over all its path decompositions. For fixed k, decompositions of treewidth 
or pathwidth k can be found in linear time |T3odfl3j . 

There is an equivalent definition of graphs of bounded treewidth which is 
often useful. A k-tree (sometimes full k-tree) is either & (k + l)-clique, or a graph 
formed from a smaller fc-tree by adding a new vertex v of degree k adjacent to 
all vertices of a fc-clique C {G is called the attachment clique of v). A k-leaf is 
any degree-fc vertex of a fc-tree, and a partial k-tree is any subgraph of a fc-tree. 
Partial fc-trees are exactly graphs of treewidth at most fc 

A full k-path is a special type of fc-tree. In its construction, we maintain the 
notion of a “current clique” (initially fc vertices of the first (fc -I- l)-clique, the 
remaining vertex being the initial k-leaf). When a new vertex is added (with 
the current clique as its attachment clique), it enters the current clique, and one 
vertex (possibly the new one) leaves the current clique, never to return. Note that 
if the new vertex immediately leaves, it is a fc-leaf (the last vertex added being 
the final k-leaf). In a proper k-path | rUKDb) . the new vertex is not permitted to 
immediately leave (as a consequence, a proper fc-path of size at least fc -|- 2 has 
only two fc-leaves). 

Figure 1(a) below illustrates a full fc path, where a is the initial fc-leaf and 
{6, c} in the initial current clique. The vertices are added in order d, e, /, g, fc, i, 
with i being the final fc-leaf. The graph is not a proper fc-path since after / 
is added to attachment clique {d,e}, it leaves immediately, allowing g to have 
attachment clique {d, e} as well. 




Fig. 1. Example of a full fc-path 
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The class of partial k-paths (subgraphs of fc-paths) is equivalent to the class 
of graphs of pathwidth at most k Eisni. The terminology we use here is in 
common use, though the original use of “fc-path” in the previous citation was 
to refer to what we call a proper fc-path, and other authors |KS9B| have used 
“proper pathwidth” as a synonym for bandwidth. A partial proper k-path is a 
subgraph of a proper A:-path. 

We observe that a full fc-path can be partitioned into the body, which is a 
proper fc-path that includes the initial and final A:-leaves, and hairs, which are 
the remaining fc-leaves and their adjacent edges. We define an end of a full k- 
path to be the neighborhood /i of a degree-fc vertex such that the subgraph of 
G induced by V{G) \ {/i} has at most one component of size greater than 1. 
There are at most two possible ends in a full fc-path of size at least fc + 3, and 
the initial and final A:-leaves each have a distinct end as their neighbor set (the 
head and tail, respectively). In Figure 1(b), the body is marked with thick lines 
and the hair with thin lines; we can view N{a) as the head and N{i) as the tail. 
We can view a partial /c-path being decomposed in a similar fashion, where the 
body is a partial proper fc-path; in a fc-connected graph the ends will still be 
neighborhoods of degree-fc vertices. 

Our algorithms will make use of a special type of path decomposition, as 
defined below: 

Definition 2. A path deeomposition {P, x), P = PIt ■ ■ iP£> of a graph G is a 
normalized path decomposition if (1) |x(_Pi)l = fc + 1 for i odd; (2) |x(pi)| = k 
for i even; and (3) x{Pi-i) Ox(P*+i) = x{Pi) for even i. 

Notice that £ is always odd in this definition. It is not difficult to see that such 
a decomposition can be generated during the construction of a fc-path; the bags 
of size fc -|- 1 are the attachment cliques plus respective new vertices, and the 
bags of size fc are the current cliques. Given an already-constructed fc-path, one 
possible construction sequence can be established by a simple linear-time scan, 
starting from one end. 

Throughout this paper, we assume that G and H are fc-connected graphs of 
pathwidth fc, and that all path decompositions are normalized. 

2.2 Embeddings 

Each of the embeddings considered in this paper can be defined in terms of 
injective mappings. A subgraph isomorphism maps vertices of G to vertices of 
H and edges of G to edges of H; it is a special case of a topological embedding, 
which maps vertices of G to vertices of H and edges of G to vertex-disjoint paths 
in H. 

Definition 3. The graph G is topologically embeddable in the graph H if there 
is a pair of injective functions f : V{G) — >■ V{H) and 4> ■ E{G) — >■ {paths in H{ 
such that: (1) if e = (a,b) € E{G) then 4>{e) has endpoints f{a) and f{b); and 
(2) for e,e' € E{G), e yf e' , the only vertices that (j){e) and 4>{e') can have in 
common are their endpoints. 
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A graph G is a minor of a graph H ii a, graph isomorphic to G can be 
formed from H hy a series of edge and vertex deletions and edge contractions. 
Equivalently, each vertex of G is mapped to a distinct connected subgraph of H 
and each edge of G to a distinct edge of H, as defined below. 

Definition 4. The graph G is a minor of the graph H if there is a pair of 
functions (/,^) such that: (1) f : V{G) — ?> {connected subgraphs of H}; (2) 
f, : E{G) — >■ E{H) is injective; (3) if e = (a,b) € E{G) then there is are 
vertices u S V{f{a)) and v € V{f{b)) such that ^(e) = (u,v); and (j) for every 
a,b gV (G), ayt^bjV (/(a)) fl V {f{b)) = 0. We call (/, a minor embedding of 
G into H . 

In discussing properties of embeddings of G into H, we will often rely on the 
fact that if G is embeddable in H, we can derive an induced path decomposition 
of G from the path decomposition of H] details of this process for topological 
embedding were developed for fc-connected partial fc-trees OTl. To facilitate 
the definition, for a topological embedding (/, (f) we define a surjective function 
Ip that inverts / on its image, and maps each interior vertex in (f>(e) to one of 
the endpoints of e. That is, tp{f{a)) = a for every vertex a of G, and for every 
edge e = (a, b) of G, there is a vertex u on (p{e) such that Q is the subpath of 
(j){e) from f{a) to u, and R = 4>{e)\Q, then for all vertices v of Q, 'ip{v) = a and 
for all vertices w of R, ip{w) = b. 

Definition 5. For (/, (f>) a topological embedding of G into H , ip the associ- 
ated mapping of vertices in V{H) to vertices in V{G), and {P,x) ® PO'th de- 
composition of F[, we define p, : V{P) — >■ 2^^^^ as follows: p{p) = {tp{u)\u € 
x(p) and either f{a) = u for some a or u appears on (p{e) for some e G E(G)}. 
We form a path Pq from P by removing each node p G V(P) such that \p{p) \ = 0 
and by replacing each subpath qi, . . . ,qm such that p{qi) = p{qj) for all i,j by a 
single node q with p{q) = p{qi)- We form XG by restricting p to Pq- {Pg,Xg) 
is the path decomposition of G induced by (/, (p) . 

A similar definition can be made for minor embedding, and we can show that for 
both topological embedding and minor containment, {PgiXg) is a normalized 
path decomposition of width k. 

3 Track Layouts 

The additional requirement of /c-connectivity imposes strong restrictions on the 
structure of partial /c-paths. We show that the body vertices can be partitioned 
into k tracks where the tracks form vertex-disjoint paths from the initial fc-leaf to 
the final fc-leaf. This partitioning is unique up to permutation of the tracks, and 
is independent of any specific path decomposition. Track layouts of full /c-paths 
were considered previously |Pro84IP 11,898] . but our characterization of partial 
A:-path embeddings is new. 

Tracks can be extracted by examining a path decomposition of the graph. 
For (P, x) a normalized width- /c path decomposition of G, P = pi, ... ,pt, we 
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define the entry vertex of pi {i odd, i > 1 ), entry (pi), to be the unique vertex in 
x{Pi)\x{Pi-i) ■ Similarly, the exit vertex oi Pi (z odd, i < £), exit(pi), is the unique 
vertex in x{Pi)\x{Pi+i) ■ ^ vertex a: of G is a hair vertex (a non-initial non-final 
fc-leaf) exactly when x = entry (pi) = exit(pi), and hence when entry (pi) or 
exit(pi) is a body vertex of G, entry(pi) ^ exit(pi). 

For body vertices, we can use exit and entry information to form paths 
in G. If entry(pi) is not a A:-leaf, it must be a neighbor of exit(pi) (otherwise 
x(pi)\{entry(pi), exit(pi)} is a set of size fc — 1 separating entry(pi) and exit(pi) 
in G). We say that entry(pi) replaces exit(pi). A track is a sequence of body 
vertices 01 , 02 ,... a* such that Oi+i replaces Oi for all 1 < z < t. We use the 
interval notation [oi, Oj] to represent a segment Oj, . . . , aj of a track. The track 
on which a vertex o appears is denoted by track(o). For ease of notation, we 
say that the fc-leaves are on track 0. A track edge of G is any edge adjacent to 
the initial or final fc-leaf, or any edge between vertices on the same track. Note 
that an internal vertex of a track is adjacent to exactly two other vertices on the 
same track, namely the vertex it replaces, and its own replacement. A hair edge 
is an edge adjacent to a hair. Any edge that is not a track edge or a hair edge 
is called a cross edge. Figure 1(c) illustrates track edges (thick lines), hair edges 
(dashed lines) and cross edges (thin lines). The lemma below is a consequence 
of the definition of tracks, normalized path decompositions, and fc-connectivity: 

Lemma 1. In a path decomposition {P,x) of G, there exists exactly one vertex 
from each track in x(Pi) fof i even. □ 

We can view a layout of the vertices of G as starting with an initial k- 
leaf at the leftmost point, a final fc-leaf at the rightmost point, and each track 
stretched out as a straight line from left to right. Thus, if b replaces a then 
we say that a is the track predecessor of b and that b is the track successor 
of a. The position of a vertex a on a track, denoted position(a), is defined as 
follows: each vertex in the head is in position 1 on its track, and if b replaces 
a, then position( 6 ) = position(a) -I- 1. Moreover, for track(a) = track( 6 ) and 
position(a) < position( 6 ), a is to the left of b and b is to the right of a. A track 
layout of a graph G of pathwidth k is the numbering of tracks by 1 through k 
and the association of each vertex a with a pair (track(a), position(a)). 

Our algorithms proceed by attempting to create an embedding by mapping 
a track layout of G to a track layout of H . The mapping will be particularly 
useful if we select the track layout of G that “corresponds” to the track layout 
oi H. 

The general idea of the algorithm to find a track layout is as follows: maintain 
a set S (initially the head vertices, labeled 1 through k), and repeatedly find a 
vertex a of S' with one unlabeled neighbor b] give b the label of a and have 
it replace a in S. Lemma 0 below implies that the algorithm yields the same 
layout independent of the order in which vertices are processed. It shows that 
if two vertices a± and 02 could both be considered as a, they cannot both be 
replaced by the same vertex b\ (which would lead to two layouts differing in the 
track label of 61 ) . Its proof gives an idea of the kinds of connectivity arguments 
important in the proofs omitted from this conference version. 
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Lemma 2 . Let G be a k-eonneeted graph of pathwidth k and (P', A) be a proper 
prefix of at least one normalized path decomposition ofG, where P' = (pi, . . . 
for s odd. Then if ai S X{ps), N{ai) \ Di<i<sX{pi) = { 5 i}, 02 G X(ps), N{a2) \ 
Ui<i<sA(pi) = {62}, and ai ^ 02, then 61 ^ 62. 

Proof. For at least one vertex a G xiPs) such a vertex b exists, namely the 
vertex entry(ps+i) for some path decomposition (P, y) extending (P',A). Sup- 
pose instead that for ai yf 02, 61 = 62- In any normalized path decomposition 
(P, x) of which (P',A) is a prefix, since (oi,6i) and (02,61) are both edges in 
G, there must exist a bag Pj, j > s + 1 , such that {01,02,61} C x{Pj) and 
{oi, 02, 61} n x{Pj-i) = (oij 02}- Then x(Pi)\|oi, 02} is a set of size k — 1 sepa- 
rating the initial and final fc-leaves, violating the fc-connectivity of G. □ 

TheoremC] below follows as a consequence. Since either end can be the head, 
there are at most 2 (fc!) different track layouts of G, each of which can be deter- 
mined in linear time. Arbitrary degree-fc neighbors of the head and tail can be 
identified as the initial and final fc-leaf, with all other degree-A: neighbors being 
designated as hairs. 

Theorem 1. For G a k-connected graph of pathwidth k, for each head h of G 
and each numbering (permutation) tt of the tracks, there is a unique track layout 
( 6 ,, 7 t) ofG, which can be generated in linear time. □ 

Although a track layout imposes a total order on the vertices of a particular 
track, in general it provides only a partial order on the vertices of the entire 
graph. Given a track decomposition starting from a particular head h, a comes 
before 6 in the partial order if either a precedes 6 on the same track, or if there is 
an edge from a to a vertex to the left of 6 on the track of 6. This order reflects the 
fact that in any path decomposition with head h, a must appear in a bag before 
6. The partial order precludes the existence in the track layout of a transposition, 
namely a pair of edges (oi, 62), (61, 02) with four distinct endpoints, where a\ 
(respectively 61) is on the same track as and to the left of G2 (respectively 62). 
This is important in proving the correctness of our algorithms. 

We use Nt{a) to denote the set of neighbors of a on track t, and where 
appropriate we generalize the function to Nt{A) for A a set of vertices. 

4 Topological Embedding Algorithm 

Our algorithm takes an initial injective mapping / of vertices of G to vertices 
of H and iteratively refines the mapping until it forms an embedding of G into 
PI or fails. Throughout the execution of the algorithm, the possible mappings 
considered will be constrained by track layouts of G and FI . By choosing an 
arbitrary path decomposition of FI, we can fix a total order on the vertices of 
p[ and a specific track layout. If G is embeddable in FI, one of the 2 (fc!) track 
layouts of G will be associated with the path decomposition of G induced by the 
embedding. 
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We initially restrict our focus to G and H fc-connected partial proper fc-paths, 
and then we discuss extensions to more general situations. The track layouts of 
G and H can be exploited in the discussion of an embedding of G into H] 
Lemma 0 below shows that track numbers and cross edges are preserved under 
the embeddings. A topological embedding from G into H that satisfies 

the conditions in the lemma is said to be a topological embedding with respect to 
layouts {ha,TTG) and tt// )^. Lemma 0 can be proved by showing that the 
violation of any of the conditions of the lemma allows us, by using induced path 
decompositions, to find a bag in the path decomposition of G violating Lemma^ 

Lemma 3. For G and FI k-connected partial proper k-paths, G is topologically 
embeddable in H if and only if there exists a topological embedding (/, (jj) and 
track layouts {hcjirc) of G and (hH^TTn) of H such that for a i the initial k-leaf 
and ap the final k-leaf in G: 

1. for all a € V{G) \ {a/, of}, track{a) = track{f{a)); 

2. for each track edge (a,b) in G, a ^ a/ and b yf ap, 4>{{a,b)) consists of the 
path from f{a) to f{b) on the track of a; 

3. for each cross edge (a,b) in G, 4>{{a,h)) consists of the edge {f{a),f{b)); and 
4- f{oi) = ui, f{ap) = Up, for all edges (ai,b), 4>{{ai,b)) is a path from f{ai) 

to f{b) with all interior vertices on the track ofb, and for all edges (b,ap), 
4>{(b, ap)) is a path from f(b) to f{ap) with all interior vertices on the track 
ofb. □ 

The algorithm starts by forming a single track layout and total order for 
H and all 2{kl) track layouts of G. For each possible layout of G, initially we 
assign /(a) := u, where track(a) = track(rt) and position(a) = position(u). 
Given a mapping of vertices of G to vertices of H, extended in the obvious way 
to map sets of vertices, we say that a G V{G) is consistent if for all tracks t, 
f{Nt{a)) C Nt{f{a)). We next repeatedly check consistency of vertices in G. 
An inconsistency in which (a,b) G E{G) but (/(a),/(&)) ^ E{H) is resolved by 
changing one or both of /(a) and f{b). Consider the total ordering of edges of H 
between the track of /(a) and f{b) (induced by the total order on vertices of H, 
since there are no transpositions). The leftmost consistent edge with respect to 
a,b gV (G) and mapping / is the leftmost edge {u, v) (under this total ordering of 
edges) such that position(/(a)) < position(u) and position(/(5)) < position('c). 
To ensure (/(a), /(6)) G E{H), f{a) is set to u and f{b) is set to v (which can be 
viewed as “sliding” /(a) or f{b) along its track), and we update function / by 
“sliding” vertices to the right of a and b as necessary to maintain the invariants 
below. 

Invariant A For each a G V(G), track(a) = track(/(a)). 

Invariant B For vertices a and b in V{G) such that track(a) = track(6), if 
position(a) < position(6), then position(/(a)) < position (/ (5)). 

Lemma 0 follows from Lemma 0 
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Lemma 4. If there exists a mapping that satisfies invariants A and B sueh that, 
for each a G V{G), a is consistent, then there is a topological embedding from G 
into H . □ 

Given G topologically embeddable in H with respect to track layouts {he, ttg) 
of G and {hjj, t^h) of H, we can determine a partial order among topological em- 
beddings associated with the track layouts, where {fi, 4 >i) comes before (/2,</>2) 
if for all a € V{G), position(/i(a)) < position (/2(a)). Our algorithm finds the 
unique minimum embedding under this partial order, whose existence is guar- 
anteed by the following lemma. 

Lemma 5. If G is topologically embeddable in H with respect to {hG,T^G) o,nd 
{hH,T^H), then there is a unique minimum fm (with defined as in Lemma, n\) 
associated with {hG,T^G) <md (/i//,7r//). 

Proof. Suppose instead there were incomparable minimal mappings (/i, (fi) and 
(/2)</>2); there must exist a and b in V{G) such that the following conditions all 
hold: track(a) y^track(6), position(/i(a)) <position(/2(a)) and position(/2(6)) < 
position (/i(&)). We can partition the vertices in V{G) into the following three 
sets: S'! = {a G V{G) \ position(/i(a)) < position(/2(«))}, S2 = {a G V{G) 

I position(/2(a)) < position(/i(a))}, and 5 = = {o G V{G) \ position(/i(a)) = 
position(/2(a))}. 

We observe that there cannot exist an edge in E{G) between a G Si and 
b G S2, since the edges (/i(o), /i( 5 )) and {f 2(0,) , f 2(b)) form a transposition in 
H. Consequently, all edges are either between vertices in the same set, between 
Si and S'=, or between S2 and S=. 

We can form /a such that for a G ^i, /3(a) = /i(a), for a G S2, /3(a) = /2(a), 
and for a G S=, f^{a) = fi{a) = /2(a)- Clearly all edges can be mapped, and 
hence /s is an embedding violating the minimality of fi and /2, yielding a 
contradiction. □ 

Lemma 0 below demonstrates that the algorithm finds the minimum embed- 
ding. It is proved by considering the first hypothetical violation, namely a vertex 
a for which position(/(a)) > position(/m(a)), and arguing that it was unneces- 
sary to slide a past its location in the minimum embedding, since the edges of 
H required for consistency of a exist at that point. 

Lemma 6. If G is topologically embeddable in H with respect to track layouts 
(/ig,7 T(3) and (/i//, tt//), then at any point during the execution of the algorithm 
above where these track layouts are chosen, and for any vertex a G V(G), 
position{f {a)) < position{fm{a)), where fm is the unique minimal f (as de- 
fined in Lemma U). □ 

The algorithm implicit in Theorem 0 finds track layouts in linear time. In the 
topological embedding algorithm, each vertex slides forward at most n positions 
for a total of at most O(n^) slides. Each slide is the consequence of a failed 
consistency check. To check all edges takes 0 {n) time, and so in 0 {n) time an 
inconsistent pair can be detected, if one exists. The work done in each slide is 
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0{n). Thus we have described an O(n^) algorithm for topological embedding of 
fc-connected partial proper fc-paths. 

Theorem 2. For G and FI k-connected partial proper k-paths, it is possible to 
determine whether or not G is topogieally embeddable in Ft in O(n^) time. □ 

When FI can have hairs and thus is no longer proper, the situation is more 
complicated. A path of two hair edges in H can be used to embed a cross edge 
of G in the preimage of the attachment clique of that hair (which may no longer 
be a clique, since G is partial). Since there may be more than one candidate 
cross edge, ambiguity is introduced. We are able to resolve this ambiguity to 
obtain 0{n^) algorithms when one of G and H is a, full fc-path and the other a 
fc-connected partial fc-path. 

5 Minor Embedding Algorithm 

In the case of minor embedding, it would seem that vertices of G may now 
map to seemingly arbitrary connected subgraphs of H. However, as we will 
see. Lemma F7I gives a structural characterization which limits possible images 
of vertices. Throughout this section we assume that G and H are fc-connected 
partial proper fc-paths. 

To facilitate the proof of Lemma |3 we first establish a few properties of 
minor embeddings. We focus first on the role of the initial and final fc-leaves. 
Recall that the minor embedding function / maps vertices of G to connected 
subgraphs of H. 

Lemma 7. For any minor embedding (/, ^) of G into H, for any a S V{G), 
if f{a) does not contain either uj or up, then for any track layout {hn^T^H) of 
H there exists a track t in (/i_tr,7rff) such that f(a) consists of an interval of 
vertices [u^v\ on t. □ 

Proof. We first demonstrate that /(a) cannot contain any cross edge {v,w) of 
H . Since v and w are neighbors but neither is the exit vertex of the other, in 
any path decomposition (P, y) of H there must exist a bag Pr, r even, such that 
{u,?c} C x{Pr)- If V and w are in /(a), then in the induced path decomposition 
(PgjXg), \XG{Pr)\ < k, contradicting the ^connectivity of G. 

Since /(a) forms a connected subgraph of H and contains no cross edge nor 
uj nor Up, f{a) is an interval of vertices [u,v] on a single track tinH. □ 



Lemma 8. Let (/, ^) be a minor embedding of G into H and (P,x) ony path 
decomposition of H . Then for a S {o/, ap} the initial or final vertex of a track 
layout of G, there must exist pj, j even, such that x(Pj) contains a vertex in 
f{bi) for all 1 < i < k, where b\, . . . ,bk are the neighbors of a. □ 

Proof Since G is fc-connected, each bi has a neighbor in V{G)\{a,bi, . . . ,bk}. 
Each path decomposition of G must then contain a bag with {a,bi, . . . ,bk} such 
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that bi, ... ,bk all appear in the next bag of the decomposition. If there is no 
bag Pr, r even, in {P,x) such that that x(Pr) contains a vertex in f{bi) for all 
1 < * < fc, then the path decomposition of G induced by (/, ^) fails to satisfy 
the above property, yielding a contradiction. □ 

With the aid of the preceding two lemmas, the proof of Lemma 0 is similar 
to, though more complicated than, the proof of Lemma|3 (the analogous lemma 
for topological embedding). 

Lemma 9. For G and FI k-connected partial proper k-paths, G is a minor of 
H only if there exist a minor embedding (/, C) and track layouts (/igjTJ'g) ^m-d 
(hH,T^H) such that, for aj and ap the initial and final k-leaves of (/igjTtg) and 
uj and Up the initial and final k-leaves of (h}{,TTp[), the following conditions 
hold: 

1. for all a € F(G)\{a/, of}, /(a) is an interval of vertices [£{f{a)),r{f{a))] = 
[u, v] on the track of a; 

2. for each track edge (a,b) in G, f((a,b)) consists of the edge from r{f{a)) to 

mb)); 

3. for each cross edge (a, b) in G, ^((a, b)) consists of an edge from a vertex in 
f{a) to a vertex in f{b); and 

4 . /(a/) = {m/} and f{ap) = {uf}- □ 

Proof. Since G is a minor of H, there must exist a minor embedding {g, 7 ) from G 
to H . Our proof proceeds by altering this embedding to form (/, ff) and choosing 
track layouts which satisfy the conditions. We fix {hp,Trp) and then create a 
layout for G and (/, C)- For a G {aj,ap}, we let 61 ,..., fy be the neighbors 
of a. Since any bag in a path decomposition is a separator in the graph, as a 
consequence of Lemma 0 we can conclude that the /(5j)’s separate /(a) from 
the remainder of the graph. To satisfy condition 0 it will suffice to consider the 
mapping of a and its neighbors to the prefix or suffix of {hH,!:^) up to and 
including the f{bj)'s. 

By Lemmas □ El and0 each f{bj) maps to a distinct track in (/if,7Tf). By 
choosing {hG,T^c) such that track(fy) = track(/( 6 j)), we satisfy condition 0 for 
each bj. 

We consider two cases for each a € {aj,ap}, depending on whether or not 
g{a) contains a u G {uj,up}. 

Case 1: g{a) contains u 

We can direct (/igj'^g) so that u = uj if and only if a = a/. We then form 
f{a) by restricting g{a) to the single node u, and then set each f{bj) to be the 
union of g{bj) and each vertex on the path from u to the leftmost vertex in g(bj) 
(that is, an initial segment of the vertices on track track(fy)). 

Case 2: g{a) does not contain uj or up 

We can conclude from Lemma Cl that g{a) consists of an interval [u,?;] on a 
track t in H. By condition 0 applied to the bj’s we can conclude that there is a 
path from g{a) to u G {ui,up} which contains no node in g(bj) for 1 < j < fc. 
We choose {hG,T^c) so that u = uj if and only if a = a/. 
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To create (/, ^), we set /(a) = u. We can then extend the images of the bj's 
in order to include the paths from u to each gihj). In particular, for br such that 
track(g(6j.)) = track(gi(a)), we set f(br) to be the union of g{br), g{a), and the 
path from g(a) to u on track(g(a)). For each other bj such that j ^ r, we set 
f{bj) to be the union of g{bj) and the path from g{bj) to u on track((/(6j)). 

In both cases we have satisfied condition 0 and in addition conditions Q] and 
|2]for the two fc- leaves and their neighbors. To see that condition Q] holds for the 
remaining vertices, we use an argument similar to that used to prove Lemma 0 
in a proof by induction on track position. Namely, we show that if a track 
successor c of a bj is mapped to a track other than track(6j), then there exists 
a bag in a path decomposition of H containing vertices in both /(6j) and /(c). 
The result follows from the fact that this will violate Lemma Q in the induced 
path decomposition of G. 

To complete the proof, we observe that condition 0 follows from the fact that 
(g, 7 ) is a minor embedding, and that condition 0 follows from condition 0 □ 

We can show that if there exists a mapping satisfying the following invariants 
(analogous to Invariants A and B for topological embedding), with all vertices 
in G consistent, then G is a minor of H . Furthermore, as for topological em- 
beddings, we can determine a partial order among minor embeddings associated 
with track layouts (/igjTTg) and (hH,TTH), where fi comes before /2 if for all 
a € G, position(r(/i(a))) < position(r(/ 2 (a))), and show that there is a unique 
minimum. 

Invariant C For each vertex a G V{G), f{a) = [u,u] for some u and v on the 

track of a in H . 

Invariant D For each vertex a G V(G) with track successor b, position(£(/(&))) 

= 1-1- position(r(/(a))). 

The 0{n^) algorithm for determining (for fc-connected partial proper fc-paths 
G and iL) if G is a minor of iL is a modification of the topological embedding 
algorithm. Initially, for a G V{G), the tentative minor embedding sets /(a) 
to be the single vertex at (track(a), position(a)). The algorithm then checks 
for inconsistencies and moves the right endpoints of intervals as necessary (by 
Invariant D, this defines left endpoints). As the new intervals may now overlap 
previously existing intervals, we may need to “slide” the right endpoints of those 
intervals as well, in a manner similar to the sliding of vertices to the right of an 
endpoint of an inconsistent edge in the topological embedding algorithm. The 
proofs of correctness and complexity, making use of Lemma 0 are also similar 
to those for topological embedding. 

Theorem 3. For G and FI k-connected partial proper k-paths, it is possible to 
determine whether or not G is a minor of H in 0{n^) time. □ 

When trying to extend the algorithm to the case where H is no longer proper, 
we encounter the difficulty that a star of cross-edges (a set of cross-edges with 
one common endpoint) can map into a hair of H, and it is difficult to determine 
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which star to map. Again, if iJ is a full fc-path and G a ^-connected partial 
fc-path, we can resolve this ambiguity, and the minor containment algorithm 
extends in the same fashion as the topological embedding algorithm. 



6 Extensions and Open Problems 



We can improve the complexity of our algorithms when there are additional 
constraints on G and H . To specify a particular full fc-path, a track layout and a 
total order on the vertices (consistent with the partial order imposed by tracks) 
are sufficient. The neighbors of each vertex are completely determined by the 
total order. A full proper fc-path G can thus be represented as a fc-character string 
Sg derived from the track numbers of the entry vertices in a path decomposition 
of G lEESnSl; an extension of this notation allows us to handle full fc-paths, as 
well, by associating with each entry node a the number h(a) of hairs sharing its 
attachment clique. 

When G and H are both full proper fc-paths, we can solve subgraph iso- 
morphism by fixing a string representation of H and then executing string 
matching between Sh and each of the 2(k\) possible string representations of 
G. When G and H are both full fc-paths, we need to determine a matching 
such that a S V{G) matches u G V{H) if and only if track(a) = track(u) 
and h{a) < h(u). This extension of string matching can be solved in time 
o(iE(ii)iviiwiog(inG)i)) EEna, for a total complexity of 0{ri\/n\ogn). 
It turns out that for topological embedding of full proper A:-paths, a slight ex- 
tension of this idea suffices to give an 0{riy/n\ogn) algorithm, though we omit 
the details. 

The most obvious open problem is to extend the algorithms to the case when 
both G and H are fc-connected partial fc-paths. It is not difficult to construct 
dynamic programming algorithms that solve the problems in time; 

a dynamic programming subproblem asks if a “prefix” of a fixed track layout of 
G (a subgraph closed under track predecessor, of which there are only 0{n^)) 
can be mapped onto a particular track layout of H . These algorithms are es- 
sentially a simplification of the 0{n^ -i-fc-i-5.5^ algorithm for fc-connected partial 
fc-trees jGIND4f(IINf)S] . The goal, however, remains the removal of any function 
of k from the exponent. Beyond that, we suspect that the requirement of k- 
connectivity may yield more useful structural information for partial fc-trees 
than has been discovered to date. 
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Abstract. We extend the well-stndied concept of a graph power to that 
of a k -leaf power G of a tree T: G is. formed by creating a node for each 
leaf in the tree and an edge between a pair of nodes if and only if the 
associated leaves are connected by a path of length at most k. By discov- 
ering hidden combinatorial strncture of cliqnes and neighbonrhoods, we 
have developed polynomial-time algorithms that, for k = 3 and fc = 4, 
identify whether or not a given graph G is a fc-leaf power of a tree T, 
and if so, produce a tree T for which G is a fc-leaf power. We believe that 
our structural results will form the basis of a solution for more general 
fc. The general problem of inferring hidden tree structnre on the basis of 
leaf relationships shows up in several areas of application. 

1 Introduction 

The results in this paper are derived from two abundant areas of research: graph 
powers and leaf-labeled trees. Both areas contain results of a purely theoreti- 
cal nature as well as applications to such diverse areas as distributed comput- 
ing c m na, computational biology, and mathematical psychology Enna. 

Trees are versatile in their ability to represent relations between data items 
stored in their nodes. In many instances, data items are stored in a subset 
of the nodes (typically leaves); the structure of internal nodes is dictated by 
measures of distance or similarity among leaves. For example, a Steiner tree 
is a tree of minimal length containing every point in a set of inputs; a more 
general formulation is known as an A-tree \nrm\ . a fundamental problem in 
computational biology is the reconstruction of the phylogeny, or evolutionary 
history, of a set of species or genes, typically represented as a phylogenetic tree 
(the reader is referred to papers that review research in the area of evolutionary 
history lidK W 99lidS W 99IK W 991 1 . In a phylogenetic tree, each leaf is labeled by 
a distinct known species; a tree is then formed by positing possible ancestors 
that might have led to this set of species. 
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By viewing the correlations between leaves as distances between nodes in a 
graph, we can frame the problem of forming a phylogenetic tree as the problem of 
forming a tree from a graph. One such correlation between graphs and trees, or 
more generally between graphs and graphs, arises in the notion of graph powers, 
where a graph G is the kth power of a graph H if nodes x and y are adjacent in 
G if and only if the length of the shortest path from a; to j/ in if is at most k. 
Although in general it is NP-complete to recognize a graph power |MSh4| . it is 
possible to determine if a graph is the power of a tree in time O(n^), where n is 
the number of vertices in the input graph IOK98I . 

In this paper we introduce the notion of a k -leaf power of an unlabeled tree 
T, where a graph G is the k-leaf power of a tree T if there exists a vertex in 
V{G) for each leaf in T and an edge in E{G) between vertices u and v if and 
only if there is a path of length at most k between the leaves associated with u 
and V in T. The problem of recognizing fc-leaf powers is inspired by the problem 
of forming a phylogenetic tree based on distance thresholds: given a graph G in 
which there is an edge for each pair of species at distance at most k, the tree T 
of which G is a A:-leaf power is a phylogenetic tree in which the associated pair 
of leaves is guaranteed to be at distance at most k. A related concept is that 
of the threshold graph formed on a set of n nodes and a weighting function on 
edges by including only edges less than a set threshold; there exist algorithms 
to extract a tree from the graph by first finding connected components Ennij. 

We derive polynomial-time algorithms for recognizing /c-leaf powers for A: = 3 
and fc = 4. Our algorithms are based on the hidden structure of fc-leaf powers, of 
independent combinatorial interest; as leaf labels do not play a part in our algo- 
rithms, our work is applicable to arbitrary trees. The complex characterizations 
of fc-leaf powers are based on the structural properties of cliques and neighbour- 
hoods. The properties are particularly tricky to derive in the presence of internal 
nodes that are not the neighbours of leaves: such nodes serve as invisible enti- 
ties that subtly alter the structure of the relationships of neighbourhoods. The 
lemmas we prove may be helpful not only in generalizing our results to fc > 4, 
but also in unrelated problems on graphs and trees. 

We first establish properties of neighbourhoods in trees in Section 0 Next, we 
present a representation of the original graph as a clique graph, defined in Sec- 
tion 0 Section 0 contains polynomial-time algorithms which determine whether 
or not a graph G is a 3-leaf power or a 4-leaf power of a tree T, and if so, 
demonstrate one such T. Finally, directions for future research are discussed in 
Section El In this conference version, nearly all proofs, illustrations, and state- 
ments of auxiliary lemmas are omitted to save space. A few proofs have been 
included to give a flavour of the techniques used. 

2 Preliminaries 

2.1 Trees and fc-Leaf Powers 

To help avoid confusion, we will refer to vertices in a tree T and nodes in its /c-leaf 
power G. We classify each internal vertex as visible if it has a neighbouring leaf. 
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or invisible otherwise. Leaves are not defined to be either visible or invisible. A 
tree with no invisible vertices is an ideal tree. 

The case of 2-leaf powers is not interesting; a 2-leaf power G is a set of 
disjoint cliques, each clique corresponding to the leaves of T adjacent to one 
internal vertex. Any tree formed by connecting the internal vertices yields the 
same 2-leaf power. 

When k > 2, the set of leaves of T adjacent to an internal vertex also forms a 
clique in G, but these cliques may overlap and will not, in general, be maximal. 
It is the maximal cliques of G that hold the key to reconstructing T, and we 
must find elements in T that correspond to these cliques. Since a fc-leaf power is 
an induced subgraph of a power of trees, and powers of trees are chordal [( JKfiSj . 
clearly the graphs we are trying to recognize are chordal. We can check for 
chordality and find all maximal cliques in linear time [Kfav72IK! riTtij . 

A few easy observations will simplify our task. Given a graph G which is a 
fc-leaf power, we can treat each connected component separately, and connect 
the resulting trees by paths of length k. Consequently, we can assume (and will 
do so for the rest of the paper) that G is connected. Any tree T whose 3-leaf 
power is connected has no invisible vertices (as these would disconnect the 3-leaf 
power) and similarly, any tree T whose 4-leaf power is connected cannot have 
two adjacent invisible vertices. 

The distanee d{u, v) between two nodes u,v in a, tree T is the number of 
edges in the unique path between them. We will find it convenient to define the 
distance d{u, e) between a node u and an edge e = (n, w) as (d(u, v) + d{u, w))/2. 
Note that, in a tree, this is always of the form J + | for an integer j; intuitively, 
the extra half is the amount needed to “get to the center of the edge” . Similarly, 
we define the distance d(ei,C2) between two edges ei and 62 in a tree as one 
more than the number of edges in the unique path between them. Intuitively, 
the addition is due to the two extra halves needed to “get to the center” of each 
edge. It is not hard to verify that the triangle inequality holds for this extended 
notion of distance. 

Definition 1. For i even, the ^-neighbourhood with center vertex v (respee- 
tively, center edge e for i odd) in a tree T is the set of all leaves of distanee at 
most i/2 from internal vertex v (respeetively, edge e = (u,v), where u and v are 
both internal vertices). 



Lemma 1. In a 2k-leaf (resp. 2k -|- 1-leaf) power G of a tree T, the ver- 
tices of any maximal clique M of G form a 2k -neighbourhood (resp. 2fc -|- 1- 
neighbourhood) in T of some internal vertex v (resp. edge e). □ 

Proof, (even version) Let P be a longest path in T between two points of M 
and let the endpoints of P be it and w. Clearly, P has length at most 2k. Let 
V be the midpoint of P (if P contains an even number of vertices, break the tie 
arbitrarily) and let N be the fc-neighbourhood of v. 

First, we show M C N. Let cc be a vertex in M \ N (that is, d{v,x) > k). 
Let z be the vertex at the point where the path from a; to n meets P; z divides 
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P into two pieces. If both of these pieces have length less than k, then the path 
from X to either u or w is longer than P, contradicting the choice of P. So one 
piece has length at least fc; but then the distance from x to the endpoint of this 
piece exceeds 2 k, a contradiction. Thus no such vertex x exists. 

Next, we show that N C M. Let y be an arbitrary vertex in N and x an arbi- 
trary vertex of M. Since M C N, d{x, v) < k. Thus d{x, y) < d{x, v) + d{v, y) < 
2 k, and x and y are connected in G. Since x was arbitrary, y is connected to 
every vertex in M, and by maximality of M , y G M. □ 

The converse of Lemma Efails to hold. For example, we can construct a path 
u, V, w, and x of internal vertices such that both v and x are invisible, and all 
other neighbours of w are leaves. Then, the 4 -neighbourhood with center w (the 
leaf neighbours of w) is a proper subset of the 4 -neighbourhood with center v 
(the leaf neighbours of u and w). Even in an ideal tree, vertices “close to the 
edge” can have nonmaximal 4 -neighbourhoods, in a way quantified in the next 
section, where we look at the structure underlying neighbourhoods and maximal 
cliques. 



3 Properties of Neighbourhoods 

We will discover the hidden structure of the underlying tree of a fc-leaf power by 
intersecting maximal cliques, which are neighbourhoods. The following technical 
lemma aids in characterizing the structure of intersections of neighbourhoods. 

Lemma 2 . For vi and V2 internal vertices or edges in a tree T such that 
d{vi,V2) = r, and Si the ki-neighbourhood of vt for ki > 2 , i G { 1 , 2 }, the 
following conditions hold: 

(a) if ki + k2 — 4 , < 2 r , then SiD S2 = 0 ; 

(b) if 2 r <k\ — /c2, then S2 Q S\; 

(c) if 2 r < k2 — ki, then S\ C S2; and 

(d) if ki + k2 — 4 > 2 r > \k\ — k2\, then S\ nS'2 is the (^li^ — r) -neighbourhood 

of the unique vertex/edge whose distance from vi is -|- | and whose 

distance from V2 is | ^ □ 

Proof. We will examine the case where k\,k2, and r are all even. The analysis 
for the other cases is very similar. Suppose that SiC\S2 ^ % and w is an arbitrary 
leaf in Si fl S'2. Let v be the unique vertex of T that is adjacent to w. Clearly, 
d{v,Vi) < ki /2 — 1 , for * = 1 , 2 , and hence (a) follows from the fact that r = 
dist(i;i,z;2) < ^4^-2. 

Suppose now that fci > 2 r -|- ^2 and let x G S2- This means that d{v2,x) < 
^2/2. As d{v\,x) < d{vi,V2)+d(v2, x), we can conclude that d{vi,x) < r-\-k2/2 < 
ki /2 and x G Si. Therefore, S2 C Si and (b) follows. The proof of (c) is very 
similar. 

We now let u be the unique vertex of T that is at distance _|_ r from 

vi and distance _|_ L from V2, and S be the (^id^ — r (-neighbourhood 
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of u. We must show that 5 = S'! fl 82- For x G S, d{x,u) < As 

d{u,vi) = -I- we conclude that d{x,Vi) < d{x,u) + d{u,Vi) = — 

r _|_ ki-k2 -\-L = fci/2, and x € S\. Similarly, we can show that x G 82 and thus 
8C8in 82- 

For X ^ 8, dist(a:,M) > Deleting u divides T into connected 

components; notice that one of the Vi’s, say v\, is not in the connected component 
that contains x. As vi and x are in different connected components, clearly 
d{x, vi) = dist(a;, u) + d{u, vi) > ~ i + -I- § = fci/2 and x cannot be 

in the fci-neighbourhood of vi. This implies that x ^ 81 D 82, and we conclude 
that 8 = 81(182- □ 

Using Lemma O we can easily prove the following results concerning the 
structure of neighbourhoods by simply setting the parameters fci, ^2, and r. 
Although stated in a general form, in this paper we apply these primarily in the 
case j = 4. 

Lemma 3. The following conditions hold for any tree T and j > 4; 

1 . The intersection of two distinct j -neighbourhoods is either empty or a j' - 
neighbourhood for 2 < j' < j — 1. 

2 . No (j — 1 ) -neighbourhood is a subset of more than two distinct j- 
neighbourhoods for j even. 

3 . If a {j — 2 ) -neighbourhood is a subset of a j -neighbourhood, then their centers 
are either identical or adjacent. 

4 -. The {j — 1 ) -neighbourhood of an edge is the intersection (union) of the j- 
neighbourhoods ({j — 2) -neighbourhoods) of its endpoints. 

5 . The {j — 2 ) -neighbourhood of a vertex of degree at least two is the intersection 
of the j -neighbourhoods of any two of its neighbours for j even. 

6. Let 8 be the intersection of two ^-neighbourhoods of two edges 61,62- If 61 

and 62 are adjacent then 8 is the 2-neighbourhood of their common endpoint; 
otherwise 8 is empty. □ 

We make use of terminology that distinguishes between types of vertices in 
a tree. For T' the tree obtained from T after two successive leaf prunings, we 
partition the internal vertices of T into those which are not in T' {marginal 
vertices), those which are leaves in T' {peripheral vertices) and those which are 
internal nodes in T' {central vertices). Any edge incident on a central vertex is 
a central edge. An edge in T is pendant if one of its endpoints is a leaf. 

4 Clique Graphs and Their Properties 

Our algorithms rely on the representation of graphs as directed acyclic graphs of 
maximal cliques and their intersections. In this section we introduce the notion 
of a clique graph and establish properties that prove useful in the development 
of our algorithms. 
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4.1 Clique Graphs 

The clique graph Cq of a chordal graph G is a directed acyclic graph, whose 
nodes are labeled by cliques of vertices in G. The definition of Cq is given 
algorithmically. We build Cg by first computing a set of node labels, and then 
creating edges. The node label set is initialized to all maximal cliques. Then all 
intersections of pairs of existing node labels are added to the set. This intersection 
step is repeated one more time, which completes the set of node labels. A node 
is created for each label, and an edge added from a node a to a node h if the 
label of a is a subset of the label of b. In Lemma 0 we prove linear bounds on the 
numbers of nodes and edges of clique graphs of leaf powers, and the algorithm 
halts if these bounds are exceeded. This is necessary because the clique graph 
of an arbitrary chordal graph could have exponential size (consider a graph on 
vertices ui, . . . , Un/ 2 , ui, . . . , Vn /2 ™ which there is an edge {ui, uj) and {ui, Vj) for 
all i j). Finally, we construct the transitive reduction of the graph in a naive 
fashion, by checking triples of nodes (a, b, c), and removing the edge (a, c) if edges 
(a, b) and (b, c) exist (this is unambiguous since our graph is a directed acyclic 
graph). Beyond ensuring a polynomial running time, we have not attempted 
to optimize this construction; further investigation of the properties of chordal 
graphs may improve the lemma below. 

Lemma 4. Given a chordal graph G = (V,E), Gq can be computed in time 

o(|yp). 

Proof, (sketch) We first find all 0(|1L|) maximal cliques in time 0{\V\'^) (the 
number of cliques and running time are a consequence of the linear-time recogni- 
tion of chordal graphs by means of a perfect elimination ordering jRTL76ITY84j f. 
Next, we form sets of intersections of sets in 0(|Yp) time. Forming the graph 
and its transitive reduction can be can be accomplished naively in cubic time. 

□ 

The label of a node c in a clique graph is its clique graph label, denoted 
cglabel(c). We use well-known tree/DAG terminology such as parent, child, 
grandparent, grandchild, descendant, and ancestor to describe relationships be- 
tween nodes of a clique graph. A node of the clique graph is a border node if 
it has a unique parent. Sinks (labeled by maximum cliques) are at level k — 1 
(where fc — 2 is the length of the longest directed path in Gq) and any other node 
is at a level one less than the minimum level of its children. If a node is at level 
j we call it a level-j node. We use Gjj+i to denote the underlying undirected 
subgraph of Cq induced on edges between nodes at levels j and j + 1. Levels of 
nodes in a clique graph can easily be found by depth-first search, and we prove 
in subsequent sections that the clique graph of a fc-leaf power has at most k — 1 
levels (for fc = 3,4). The internal vertices of a sample input T are illustrated in 
Figure 1; for convenience, sets of leaf neighbours, omitted from the figure, are 
indicated by letter labels and invisible vertices are indicated by squares. Figure 
2 depicts the clique graph G (for fc = 4) generated from the tree T. In this 
example, the nodes with clique graph labels mop, nmo, cde, jk, and gh are all 
border nodes. 
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Fig. 1. Internal vertices of tree T. Letter labels denote sets of leaf neighbours; squares 
denote invisible vertices. 




levels 

1 



2 



gh 



3 



Fig. 2. Clique graph G (A: = 4) of tree T in Figure 1. 



A clique graph is an ideal clique graph if it can be generated from a k- 
leaf power of an ideal tree T. Ideal clique graphs have elegant properties which 
are absent from general clique graphs. We can view a general clique graph as 
having been generated from an ideal clique graph, with subsequent “collapsing” 
occurring at the invisible nodes. 



4.2 General Clique Graphs and Clique Graph Partitioning 

The presence of invisible nodes complicates the characterization of general clique 
graphs of 4- leaf powers, for which the correlation between neighbourhoods and 
levels is no longer so clean. For example, if u, v, w, and x form a path in T 
such that V and x are invisible and the other neighbours of w are all leaves, 
then the 4-neighbourhood of w is equal to the 2-neighbourhood of w and the 
3-neighbourhoods of (v,w) and (w,x), and is a subset of the 4-neighbourhood 
of u. As a consequence of the blurring of distinctions between types of neigh- 
bourhoods, intuitively, general clique graphs of 4-leaf powers have the following 
structure: the sections of height 3 look like ideal clique graphs, but the sections 
of level 2 can be arbitrary bipartite trees. This is proved in Theorem^ The fol- 
lowing lemma characterizes a few constraints on the correlations between levels 
and neighbourhoods: 

Lemma 5. The clique graph of a A-leaf power has at most three levels. In 
any path of length three, the labels of the nodes are, in order from source to 
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sink, a 2-neighbourhood, a ^-neighbourhood, and a A-neighbourhood. A level- 
3 node is always a A-neighbourhood, sometimes a ^-neighbourhood, and never 
a 2-neighbourhood. A level-2 node can be a 2-neighbourhood and/or a 3- 
neighbourhood, but never a A-neighbourhood. □ 

Given a three-level clique graph C, we decompose C into a set of subgraphs 
and linking edges. We identify two types of three-level subgraphs, namely nonde- 
generate and degenerate three-level subgraphs, as well as two-level subgraphs. In 
Theorem [D we stipulate additional conditions which ensure that C is the clique 
graph of a 4-leaf power. 

The decomposition algorithm starts by creating a partition M of level- 1 nodes 
that have at least two level-2 children, where two nodes are in the same set of 
the partition if they share a level-2 child. For each set P of the partition, it forms 
the subgraph Np oi C induced by P, the level-2 children of nodes in P, and the 
grandchildren of nodes in P. These are the nondegenerate three-level subgraphs. 
Removing every Np from C temporarily, the algorithm starts to form the set 
A of roots of degenerate three-level subgraphs. While there exists in C a level-1 
node a with a level-2 child in C, it forms the subgraph Da of C induced by a, its 
level-2 children and its grandchildren. Da is temporarily removed from C and a 
is added to A. 

What remains must be two-level subgraphs. The algorithm forms the sub- 
graph F induced on vertices in C, renaming each level-1 vertex to be a level-2 
vertex, and forms the set E of linking edges, namely all edges of C not in any 
Np, Da, or F. All removed components are restored, and the partitioning is 
done. We must now reason about its effects, and discover enough structure to 
justify the reconstruction algorithm. Figure 3 illustrates the decomposition of 
the clique graph G from Figure 2, with linking edges appearing as dashed lines. 




Da F Np 



F F 



Fig. 3. Partitioned clique graph G. 



Since the label of a node a of the clique graph C of a 4-leaf power of a 
fixed tree T can be both an i-neighbourhood and a j-neighbourhood for i / j, 
we introduce the notion of a range (intuitively, the size of the visible part of 



On Graph Powers for Leaf-Labeled Trees 



133 



the neighbourhood) and a middle (the center of the visible part of the neigh- 
bourhood). More formally, the range of a is the length of the longest path in 
T connecting leaves in cglabel(a). The middle of a, denoted middle(a), is the 
vertex/edge of T that is in the middle of such a path. 

Lemma 6. If c is a node of range k, then cglabel(c) is the k -neighbourhood of 
its middle. □ 

Proof. Any vertex in cglabel(a) is at distance at most k/2 from the middle, 
otherwise a path of length greater than k connecting vertices in cglabel(a) can 
be constructed. Furthermore, the longest path in an z-neighbourhood with cen- 
ter vertex/edge v is of length at most i. To see this, let the endpoints of the 
longest path be Vi and V2, and note that d{vi,V2) < d{vi,v) d{v,V2) < i. 
Thus cglabel(a) cannot be an z-neighbourhood for i < k, and it must be a k- 
neighbourhood. □ 

We can extract structure by examining middles in conjunction with levels of 
nodes. 

Lemma 7. For c a node in a clique graph C of the i-leaf power of a tree T, 

1. if middle{c) is a vertex v for a level-3 node c, then eglabel{c) is the 4- 
neighbourhood of v and v has at least two visible neighbours; 

2. if middle{c) is a vertex v for a level-2 node c, then cglabel{c) is the 2- 
neighbourhood of v and v is visible; and 

3. if middle{c) is an edge e, then eglabel{c) is the 3 -neighbourhood of e and both 

endpoints of e are visible. □ 

We are now ready for the main theorem on decompositions of clique graphs. 

Theorem 1. The following conditions are true of the clique graph C of the 
connected 4-leaf power G of a tree T: 

1. For Pi and P2 distinct sets in the partition Af , Np.^ and Np.^ do not intersect. 

2. For a\ and 02 distinct vertices in A, Da^ and Da2 do not intersect. 

3. For any P € Af and a G A, Np and Da do not overlap. 

4. Each Np is isomorphic to a (necessarily ideal) clique graph of an ideal sub- 
tree. 

5. In each Da, a has only one child and exactly two grandchildren. 

6. F is a forest without edges connecting nodes of the same level. 

7. A linking edge can connect either a level-1 node of a three-level subgraph and 
a level-3 node of two-level subgraph (central linking edge^, a level-2 node 
(formerly level-1 ) of a two-level subgraph and a level-2 node of a three-level 
subgraph (peripheral linking edgej, or a level-2 node of a two-level subgraph 
and a level-3 node of a three-level subgraph (marginal linking edge/ 

Proof, (sketch) dQl and Q follow with a bit of reasoning about properties of 
clique graphs already proven. (EJ and 0) are direct consequences of the algo- 
rithm. The hardest parts to prove are m and ( 0 - ® is proved by by identifying 
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an ideal subtree of T and then proving that its clique subgraph is isomorphic to 
the component Np] is proved by showing that €2^3, which contains F as & 
subgraph, is topologically equivalent to a forest F* created from T. These last 
two proofs occupy several typeset pages and make good use of Lemma0 Finally, 
© is proved using tools developed in the proof of ©. □ 

Referring to vertices by their clique graph labels, in Figure 3, the edges from 
m to ma and from c to cj are central linking edges, the edge from d to cd is a 
peripheral linking edge, and the edge from a to abc is a marginal linking edge. 

Finally, we can quantify the constants in the linear bounds on the number 
of nodes and edges in the clique graph of a 4-leaf power. 

Lemma 8. If G is the connected 4~leaf power of a tree T , then the clique graph 
algorithm generates at most 6n nodes and at most 18n edges. 

Proof, (sketch) T has at most 2n internal vertices since no two invisible vertices 
can be adjacent (otherwise the 4-leaf power is disconnected) and every visible 
vertex has at least one associated leaf. Since each node in the clique graph is a 2-, 
3-, or 4-neighbourhood of an internal vertex of G, there are at most 6n possible 
nodes in the clique graph. Each edge between a level- 1 node and a level-2 node 
is either part of a three-level subgraph or is a peripheral linking edge, so we can 
prove that these form a tree and hence number at most 6n. Similarly, the edges 
between level-2 and level-3 nodes number at most 6n. 

We finally determine the number of edges that may be generated between 
level-1 and level-3 nodes (some of which will be subsequently deleted in the 
transitive reduction). There is an edge from a level-1 node (a 2-neighbourhood) 
to each 4-neighbourhood containing it. Since the 2-neighbourhood of a vertex v 
is contained in exactly the 4-neighbourhoods of v itself and all of v’s neighbours, 
the total number of edges is the sum over all internal vertices of the degree of 
the vertex plus one. This sum equals the number of internal vertices plus twice 
the number of nonpendant edges in T, and the total is at most 6n. □ 

5 Reconstructing the Underlying Tree of a 4-Leaf Power 

We will briefly sketch the intuition behind our reconstruction algorithms be- 
fore giving details. For fc = 3, our assumption that G is connected makes its 
clique graph ideal. As a result, simple local replacement in the clique graph 
will construct a suitable tree. Due to space reasons, we omit the algorithm and 
justification, which is also simple. For fc = 4, things are not so simple. Each 
subgraph of the partitioned clique graph is treated by an appropriate form of 
local replacement, and the trees thus obtained are joined in a suitable manner. 
For clarity, we refer to nodes in the clique graph and vertices in the created tree 
T. In the course of the algorithm, vertices are labeled with subsets of V{G). 

The algorithm first creates a partitioned clique graph. For each three-level 
nondegenerate component N , it checks for the the following properties, which 
(by an omitted technical lemma) must be true of any ideal clique graph of height 
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3. Every level- 1 node must have at least two children, and all its children must 
have a unique common level-3 child. Every level-2 node must have exactly two 
children and if it has two parents, its label is the union of theirs. Two or more 
parents of a level-3 node must have a unique common parent. 

If these conditions are satisfied, the algorithm creates a subtree Tn for 
each nondegenerate three-level component N as described below. is initially 
empty. For each level-1 node a in N, it creates a vertex t{a) labeled cglabel(a). If 
level-1 nodes a and b share a child, it creates the edge {t{a),t{b)). For each border 
node a with parent b, it creates a vertex t{a) in Tjy labeled cglabel(a)\cglabel(6), 
and the edge (t{a),t{b)). For each level-3 node a with two parents b and c such 
that A = cglabel(a)\{cglabel(6)Ucglabel(c)} is nonempty, it creates a vertex t{a) 
labeled A, and for d the common parent of b and c, it creates edge {t{a),t{d))- 

Next each degenerate three-level component D is checked to ensure that it is a 
degenerate ideal clique graph (one level- 1 and one level-2 node, at most two level- 
3 nodes), and from it a tree To is formed as described below. For the level- 1 node 
a, the algorithm creates t{a) labeled cglabel(a); for the level-2 node 6, it creates 
t{b) labeled cglabel(6)\cglabel(a); and for the level-3 nodes c and d, it creates t{c) 
labeled cglabel(c)\cglabel(&) and t{d) labeled cglabel(d)\cglabel(&), as well as 
edges {t{d),t{a)), A{b)) , (t(6),t(c)). Figure 4 illustrates the subtrees derived 

for the three- level components of Figure 3. 




Fig. 4. Subtree derived from three-level components of Figure 3. 



The subgraph F induced on nodes in C not in any N or D must be a forest 
of nodes at levels 2 and 3. If it is, subtrees of T are created from its components. 
For each tree S in F, an initially empty subtree Tg is created. For each level-2 
node a in S, the algorithm creates a vertex t{a) labeled cglabel(a). For each 
level-3 node a, A A = cglabel(a)\ parent of a cglabel(&) is empty, it creates 
a vertex t{a) with the empty label, and an edge (t(a),t(6)) for each parent b 
of a. Otherwise, it creates a vertex t{a) labeled A, a vertex Va with the empty 
label, the edge {t{a),Va), and an edge (t{b),Va) for each parent b of a. Figure 5 
illustrates this process. 

The union of all Tn, Td, and Eg forms a labeled forest L. Subtrees are 
connected as specified by linking edges. For each central linking edge (a, b), a in 
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Fig. 5. Subtrees derived from two-level components of Figure 2. 



Tjv or To, b in Ts, it creates the edge (t{a),Vb), and removes the label of t{a) 
from the label of t{b). For each peripheral or marginal linking edge (a,b), a in 
Tjy or To, b in Ts, it identifies the nodes t{a) and t{b). 

Finally, the vertex label of each vertex v is replaced by a set of leaves with 
those names adjacent to v. Figure 6 shows the reconstructed tree for the running 
example. Although it is not identical to Figure 1 (as a clique graph can represent 
more than one possible tree), it is differs only in the absence of invisible vertices 
between g and h and between j and k. 




Fig. 6. Tree generated from clique graph of Figure 2 by reconstruction algorithm. As 
before, sets of leaf neighbours are indicated by letter labels, and invisible vertices by 
squares. 



The correctness of the algorithm follows from the two lemmas below. The 
first shows that the 4-leaf power of the constructed tree T is a subgraph of G. 
The second shows that G is a subgraph of the 4-leaf power of T. 

Lemma 9. If leaves u and v are of distance at most four in T, then there exists 
an edge {u, v) in G. 

Proof, (sketch) In the tree T formed by the algorithm, we consider all possible 
parents p and q oi u and v such that the distance between p and q is at most 
two, and show that in each case (u, v) must have been an edge in G. □ 



Lemma 10. If {u, v) is an edge in G, then u and v are of distance at most four 
in T. 
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Proof, (sketch) Since u and v are leaves of T, it suffices to show that their 
associated internal vertices are of distance at most two in T. The edge {u, v) 
must be in some maximal clique of G, and so u and v appear together in some 
label of a level-3 node a in C. The proof proceeds by looking at what happens to 
node a during the algorithm, and in which vertex labels u and v can be found. 

□ 



Theorem 2. Given a graph G with n vertices and e edges, it is possible in time 
0(ji^) to determine whether or not G is a A-leaf power or a 3-leaf power of a 
tree T, and if so, to determine such a T. 

Proof, (sketch) We have seen that clique graph generation takes 0{n^) time, 
and produces a clique graph with 0(n) vertices and edges; partitioning and 
local replacement then clearly take time 0(n). □ 

6 Conclusions and Further Work 

Reconstructing the A:-leaf powers of ideal trees for fc > 4 would be easy (by gen- 
eralizing the part of the reconstruction algorithm that deals with nondegenerate 
three-level subgraphs) but less interesting than handling more general trees. We 
believe the clique graph approach offers promise for the general case, though 
more work is needed to quantify exactly how collapses occur as a result of invisi- 
ble vertices. The main stumbling block appears to be the combinatorial explosion 
in the number of cases in the analysis of the extension of results like Theorem ^ 
which may be controlled by discovery of further general structure. It might also 
be possible to extend these techniques to consider the case of weighted edges in 
the tree T. 

Among the objections to practical use of the algorithms is that the number of 
trees that correspond to a particular fc-leaf power could be very large. It might be 
interesting to determine all corresponding trees, or perhaps all trees that satisfy 
a given set of additional constraints. 
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Abstract. We apply Lekkerkerker and Boland’s recognition algorithm 
for triangulated graphs to the class of weakly triangulated graphs. This 
yields a new characterization of weakly triangulated graphs, as well as a 
new recognition algorithm which, unlike the previous ones, is not based 
on the notion of 2-pair, but rather on the structural properties of the 
minimal separators of the graph. It also gives the strongest relationship 
to the class of triangulated graphs that has been established so far. 



1 Introduction 

Weakly triangulated graphs were introduced by Hayward m as a natural ex- 
tension of the perfect class of triangulated graphs. A graph is triangulated, or 
chordal, if it does not contain a chordless cycle on four or more vertices. A graph 
is weakly triangulated if neither the graph nor its complement contains a chord- 
less cycle on five or more vertices, or equivalently, the graph contains neither a 
hole nor an antihole. A graph with no hole can fail to be perfect, but Hayward 
proved that for weakly triangulated graphs perfection is preserved. 

This class has given rise to a continuous flow of research HE], m, HE], m, 
In particular, time complexity for recognition of the class has 
steadily improved over the years. Hayward CH proposed an O(n^) recognition 
algorithm for weakly triangulated graphs that checks for the presence of a hole 
in the graph and then in its complement. This was improved to by 

Spinrad’s hole-finding procedure EH. 

Hayward, Hoang, and Maffray HSl characterized weakly triangulated graphs 
by the presence of a 2-pair: a pair {a, b} of non-adjacent vertices such that every 
chordless path from a to b has exactly two edges. Arikati and Rangan gave an 
efficient algorithm for finding a 2-pair, and Spinrad and Sritharan E2| used this 
to improve the recognition to 0{n^m) by repeatedly finding a 2-pair {a,b} and 
adding the edge ab, until the graph becomes complete. Their idea is that adding 
an edge between the vertices that make up a 2-pair preserves the property of 
being weakly triangulated, because if {a, b} is a 2-pair, then a and b together 
cannot belong to a hole or an antihole. Note that this also implies that an edge 
ab which is a 2-pair of the complement graph (called a co-pair [I'l bj ) can likewise 
be deleted without changing the property of being weakly triangulated. 



M.M. Halldorsson (Ed.): SWAT 2000, LNCS 1851, pp. 139-EHl 2000. 
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Hayward showed the presence of a separable set of edges, called a 

handle, which form a connected subset such that all the edges of the handle see 
all the vertices of the corresponding separator. This notion was used recently by 
Hayward, Spinrad, and Sritharan m to give an 0{m?) recognition algorithm, 
which finds a set of co-pairs by computing a handle of a handle recursively, and 
repeatedly removing a co-pair, until no edge is left in the graph. 

In this paper, we introduce an algorithm for recognition of weakly trian- 
gulated graphs that is not based on the notion of 2-pair or co-pair. Several 
attempts have been made to establish a structural relationship between triangu- 
lated graphs and weakly triangulated graphs. We establish a strong connection 
between these two classes of graphs using the notion of LB-simpliciality, which 
we will define in Section 21 The proof of this new structural relationship and a 
recognition algorithm based on this result are given in Section ^ This introduces 
a totally new and different approach to the recognition of weakly triangulated 
graphs. We define LB-simpliciality for edges, and we simply check that all the 
edges of the given graph are LB-simplicial. Although we believe that our algo- 
rithm can be implemented to match the time bound O(m^), the proven time 
complexity is 0{n^m). Among the strengths of the presented algorithm are its 
simplicity and elegance. In addition, the algorithm is highly parallel in spirit. The 
edges of the given graph can be checked for the desired property independently 
of each other in any order, or in particular in parallel. None of the previous 
algorithms for recognition of weakly triangulated graphs have this property. 

2 Preliminaries 

All graphs in this work are undirected and finite. A graph is denoted G = {V,E), 
with n= |y|, and m = \E\. G{A) denotes the subgraph induced by a vertex set 
A <Z V , but we will often denote it simply by A when there is no ambiguity; 
G{A) denotes the subgraph induced by A in the complement of G. 

A graph is eomplete if all of its vertices are pairwise adjacent. A elique in a 
graph is a complete subgraph. We denote a path on k vertices by Pk- A ehord 
is an edge between two non-consecutive vertices of a path or a cycle. A hole 
is an induced chordless cycle on five or more vertices, and an antihole is the 
complement of a hole. In this paper, we will regard all subgraphs as vertex sets. 

The neighborhood of a vertex x is N{x) = {y x \ xy € E}; we will say 
that a vertex x sees another vertex y iS xy € E. The neighborhood of a set of 
vertices A is N{A) = \Jx^aN{x) — A. A vertex is simplicial if its neighborhood 
is a clique. For a set of vertices A, a confluence point is a vertex of A that sees 
all the vertices in N{A). 

In order to have analogous definitions for edges, we regard an edge ab as a 
set of vertices {a, b}. The neighborhood of an edge is a vertex set. We will let 
N{ab) denote the neighborhood of edge ab, i.e. N{{a, b}). Hence, an edge ab sees 
a vertex x if either a or 5 (or both) sees x. 

For X C V, C(A) denotes the set of connected components of G{V — X) 
(connected components are also vertex sets). 5" C is called a separator if 



Recognizing Weakly Triangulated Graphs by Edge Separability 141 



|C(S')| > 2 , an ab-separator if a and b are in different connected components of 
G{S), a minimal ab-separator if S is an a 6 -separator and no proper subset of S 
is an a 6 -separator, and a minimal separator if there is some pair {a, b} such that 
S' is a minimal o 6 -separator. Equivalently, S is a minimal separator if there exist 
Cl and C 2 in C(S) such that N{Ci) = N{C 2 ) = S. A component C of C(S) is 
called full if N{C) = S. S(G) denotes the set of minimal separators of G. A set 
A C E is separable if N(A) is a minimal separator. 

3 Connections between Triangnlated 
and Weakly Triangulated Graphs 

The notion of a simplicial vertex in a triangulated graph was introduced inde- 
pendently by Dirac |H| and by Lekkerkerker and Boland m as an extension of 
the notion of a leaf in a tree, and is the basis for the following theorem by Dirac: 

Theorem 1. |Hj Any non-complete triangulated graph has at least two non- 
adjacent simplicial vertices. 

This led Fulkerson and Gross 0 to define their famous and characterizing sim- 
plicial elimination scheme: 

Characterization 1. A graph is triangulated ijf one can repeatedly find a 
simplicial vertex and delete it from the graph, until no vertex is left. 

Hayward H2j proposed a construction scheme for weakly triangulated graphs 
which is inspired by this elimination scheme. He notes the following: 1. Trian- 
gulated graphs can be generated by repeatedly adding a vertex which is not the 
middle vertex of a P 3 ; this added vertex is precisely a simplicial vertex. 2. Like- 
wise, weakly triangulated graphs can be generated by repeatedly adding an edge 
which is not the middle edge of a P 4 . This result shows that an edge in a weakly 
triangulated graph plays a role similar to the one a vertex plays in a triangulated 
graph. 

Another interesting property of the same essence was suggested by Kratsch 
El- In a triangulated graph, for every minimal separator S, every component 
C of G{V — S) contains a confluence point. Kratsch showed that in a weakly 
triangulated graph, for every minimal separator S, every full component C con- 
tains either a confluence point or a confluence edge, i.e. an edge e such that 
N{G) C N{e). Independently of this result, Hayward ^31 introduced the stronger 
notion of iS-saturating edge, which will be defined in Section 4. 

Our contribution in this paper is the extension of a characterization of tri- 
angulated graphs due to Lekkerkerker and Boland m, which implicitly uses 
separation. In a paper contemporary to Dirac’s, they show that interval graphs 
are the graphs that are both triangulated and AT-free (i.e. devoid of asteroidal 
triples). Their search for an efficient recognition algorithm led them to study 
triangulated graphs and to propose a characterization for these, which, with the 
hindsight we now have on minimal separation, can be expressed in the following 
fashion: 
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Characterization 2. m A graph is triangulated iff for every vertex x, all the 
minimal separators included in N(x) are cliques. 



In order to simplify notations, we use the following definition (the abbreviation 
LB refers to Lekkerkerker-Boland in all contexts throughout the paper): 



Definition 1. A vertex is LB-simplicial if all the minimal separators included 
in its neighborhood are cliques. 



Characterization El can thus be reformulated in the following way: 



Characterization 3. A graph is triangulated iff every vertex is LB-simplicial. 



Linear-time algorithms for the recognition of triangulated graphs are based 
on Characterization 0 as they require computing a simplicial ordering, which 
was first done efficiently by the famous algorithm known as LexBFS, due to 
Rose, Tarjan, and Lueker m- 

Recently, Berry, Bordat, and Heggernes Pl> 0) used Characterization 01 to 
compute a minimal triangulation of a graph by checking (in an arbitrary order) 
the vertices for LB-simpliciality, and adding the necessary edges whenever an 
anomaly is detected. 

In this paper, we extend the notion of LB-simplicial vertex to that of LB- 
simplicial edge, and show how we can derive an elegant and straightforward 
recognition algorithm for weakly triangulated graphs by simply checking each 
edge for LB-simpliciality. Thus, by extending Characterization 01 we will prove 
in the next section that a graph is weakly triangulated iff every edge is LB- 
simplicial. 



4 Weakly Triangulated Graph Recognition 



In this section, we extend Lekkerkerker and Boland’s algorithm for the recogni- 
tion of triangulated graphs to weakly triangulated graphs. 



4.1 Lekkerkerker and Boland’s Algorithm for Triangulated Graph 
Rrecognition 



Translated into our terminology, Lekkerkerker and Boland’s algorithm is the 
following: 
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Algorithm LB-TG 

input : A graph G = (V, E), with |1/ 1 = n and \E\ = m. 
output : An answer to the question: “Is G triangulated?” 

begin 

foreach v & V do 

if V is not an LB-simplicial vertex then 
|_ return(G is not triangulated); 

return(G is triangulated); 

end 



Checking for LB-simpliciality of a vertex x requires computing the set of 
minimal separators included in the neighborhood of x. In general, generating 
minimal separators can be done by computing the neighborhoods of the con- 
nected components resulting from the removal of certain vertex sets 13, m- 
In [18| the minimal separators in the neighborhood of vertex x are computed 
in the following way: for each component G in G{x U N{x)), compute A(G), 
which is a minimal separator included in N{x). With the following theorem, 
we give a formal description of all the minimal separators included in a clique 
neighborhood. 

Theorem 2. Let K be a clique of a graph G. The set of minimal separators 
included in N{K) is exactly M = {N(G) \ G € C(K U A(A))}. 

Proof. For each G £ C{K U N{K)), N{G) is a separator that separates G from 
K. Thus G{V — N{C)) has at least two components; one is G, and another is the 
one containing K. Since N{G) C N{K), both these components have the same 
neighborhood, N{G), and consequently N{G) is a minimal separator. We have 
to also show that there are no minimal separators included in N{K) outside of 
M. Assume that S C N{K) is a minimal separator. Then there must exist at 
least two components Gi and C2 in G{V — S) such that N{Ci) = N{C2) = S 
(i.e. full components). Since A is a clique, and S C N{K), the whole of K must 
be included in some full component. Let Gi be a full component not containing 
K. Since Gi cannot contain any neighbor of A, Gi must belong to C(AU A(A)) 
and the proof is complete. 

4.2 A Characterization of Weakly Triangulated Graphs by 
LB-Simplicial Edges 

Our approach is based on Theorem 1 from Hayward’s original paper HOI, which 
we express as: 

Theorem 3. HD! Let G be a weakly triangulated graph, and let S be a minimal 
separator of G such that G{S) is connected. Then in each full component C of 
C(S'), there is a vertex that sees all the vertices of S. 
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Hayward in m derives the following concept: 

Definition 2 . ^21 Given a set S of vertices, an edge e of G{V — S) is said to he 
S'-saturating if, for each component Sj ofG{S), at least one endpoint of e sees 
all vertices of Sj. 

and shows that in each full component of a minimal separator in a weakly tri- 
angulated graph, there is either a confluence point or an S'-saturating edge. 

The following definition is central for the main result of this paper. We define 
an LB-simplicial edge based on the role such an edge plays in a weakly triangu- 
lated graph. The importance of this notion for weakly triangulated graph recog- 
nition is analogous to that of an LB-simplicial vertex for triangulated graphs. 

Definition 3 . An edge e of E is LB-simplicial if, for each minimal separator S 
included in the neighborhood of e, e is S-saturating. 

We will quite naturally consider as LB-simplicial an edge e such that eUfV(e) = 

y. 

According to Theorem|2 the set of minimal separators included in the neigh- 
borhood of an edge e can be computed in the following fashion: for each com- 
ponent C of C(e U N{e)), compute N{C). 

Theorem 4 . (Main Theorem) A graph G = {V,E) is weakly triangulated iff 
every edge of E is LB-simplicial. 

Proof. We will also prove a slightly stronger property, namely that an LB- 
simplicial edge cannot belong to a hole. 

4= Let G be a graph in which every edge is LB-simplicial. 

1. Suppose that G has a hole xiX2...Xk, k > 5 . Clearly, X4, ...,Xk~i belong 
to the same component C of C(xiX2 U iV(a;ia;2)), and xs and Xk belong to 
the same connected component Si of G{N{C)), where N{C) is a minimal 
separator and a subset of N{xiX2). As X2 fails to see Xk and x\ fails to see 
x^, edge X\X2 cannot be A^(G)-saturating, which contradicts the assumption 
that X\X2 is an LB-simplicial edge. Note that this argument holds for any 
edge of a hole, thus no edge of a hole can be LB-simplicial. 

2. Suppose that G has an antihole which is the complement of a hole on 
x\X2...Xk, fc > 5, (for fc = 5, x\X2...x^ is also a hole). Thus X2Xk is an 
edge that fails to see x\. Let C be the component of C{x2Xk U N{x2Xk)) 
containing xi, so that N(C) is a minimal separator included in N{x2Xk). 
Vertices x^, ...,Xk-i are all in the neighborhood of both xi and X2Xk, thus 
they all belong to N{C). Clearly, x^, ...,Xk-i belong to the same connected 
component Si of G{N{C)). But Xk fails to see Xk-i and X2 fails to see xs, 
thus edge X2Xk is not A^(G)-saturating, and fails to be LB-simplicial. 

=> Let G be a weakly triangulated graph, and suppose some edge ab fails 
to be LB-simplicial. Let S = N{C) be a minimal separator contained in the 
neighborhood of ab for which ab fails to be ^-saturating, let S'! be a connected 
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component of G{S) such that neither a nor b sees all the vertices of and 
consider the subgraph G' induced by C U 5i U ab. As any subgraph of a weakly 
triangulated graph is itself weakly triangulated, G' must be weakly triangulated. 
Si is a minimal separator of G', with 2 full components, {a, b} and G. G'{Si) is 
connected, and for which ab is the only edge. Neither a nor b sees all the vertices 
of S*! , which contradicts Theorem 0 

4.3 Recognition Algorithm 

Theorem 21 yields a recognition algorithm as direct application: 



Algorithm LB-WT 

input : A graph G = (V,E), with \V\ = n and \E\ = m. 
output : An answer to the question: “Is G weakly triangulated”, and if G is 
not, an edge e that belongs to a hole or an antihole. 

begin 

foreach e G E do 

if e is not an LB-simplicial edge then 

return(G' is not weakly triangulated, and e belongs to a hole or 
an antihole); 

return(G is weakly triangulated); 

end 



Remark f. In a weakly triangulated graph, minimal separators which are not in 
any edge neighborhood are an exception; this means that every component in 
C(5') is restricted to a single (confluent) vertex, thus the graph would be of diam- 
eter two. Consequently, just as in a triangulated graph every minimal separator 
is included in some vertex neighborhood, in a non-trivial weakly triangulated 
graph, every minimal separator is included in some edge neighborhood. Thus 
LB-type algorithms actually scan the whole set of minimal separators and test 
them. Just as minimal separators in a triangulated graph are characterized as 
cliques included in the neighborhood of a vertex, the minimal separators of a 
weakly triangulated graph can be characterized by the LB-simplicial edges whose 
neighborhood contains them g). 

Example 1. In FigureQ, we use the first example from Hayward’s original paper 
m- This graph is weakly triangulated, isomorphic to its complement, and devoid 
of clique separators, and thus of simplicial and co-simplicial vertices. We will only 
demonstrate LB-simpliciality of one edge. 

LB-simpliciality testing of edge bh: N(bh) = {d, e, /, g}, and C{N{bh)Ubh) = 
{a, c}. The only minimal separator of G included in the neighborhood of bh is 
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A^({a, c}) = {d,e,g}. Connected components of G{{d,e, g}) are {d} and {e,g}. 
Vertex h sees both vertices in {e, g}, and b sees d. Hence bh is {d, e, g}-saturating 
and thus LB-simplicial. 

Note that de U N{de) = V, thus edge de will generate no minimal separator. 
The total set of minimal separators is S(G) = {{a, d, g}, {a, d, /i}, {6, e, g}, 

{b, e, h}, {c, e, g}, {d, e, g}, {d, e, h}, {d, /, h}}. 




Fig. 1. A weakly triangulated graph. 



Remark 2. Note that, in Algorithm LB-WT, the edges are processed in an ar- 
bitrary order. We would thus like to draw the reader’s attention to the parallel 
spirit of this algorithm. Since no edges are added, and the LB-simpliciality of an 
edge does not depend on that of any other edge, LB-simpliciality testing for all 
edges can be done in parallel. For a shared memory parallelization, each proces- 
sor that becomes idle picks an unprocessed edge from the global queue of edges, 
and checks whether this edge is LB-simplicial. Every processor is able to read the 
graph which is stored globally, whereas each processor locally computes the com- 
ponents of G(V — eU A^(e)) and all the information that is necessary to establish 
the LB-simpliciality of the edge being checked. None of the previous recognition 
algorithms have the property that edges or vertices of the graph can be processed 
independently and in an arbitrary order. Moreover, the straightforward parallel 
implementation described here can be enhanced along the guidelines presented 
in the complexity analysis given in the next subsection. 



4.4 Complexity 

For each of the m edges ab in A, computing ab U N(ab) requires 0(n) time. 
Computing the minimal separators contained in the neighborhood of ab requires 
computing the connected components of C{abU N{ab)) as well as their neighbor- 
hoods in G, according to TheoremEl and this can be done in a single 0(m)-time 
graph search for each edge ab. We will encounter at most n minimal separators 
included in the neighborhood of ab. Moreover, the sum of the number of vertices 
in these separators will be less than m for each edge ab. Thus for each edge ab, 
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encountering the minimal separators included in N(ab) can be done in 0(m + n) 
time, and at most 0{n) separators will be encountered. 

One problem is that we may encounter the same separator many times, and 
we do not want to keep several copies of the same separator. Since each edge 
encounters at most n separators, we might have a total of mn separators if we 
allow multiple copies. However, it is shown in ^ that the number of minimal 
separators in a weakly triangulated graph is at most m+n. In order to avoid mul- 
tiple copies, we use a suitable data structure to memorize the minimal separators 
and their co-connected components. Such a structure is described by Nourine 
and Raynaud (see also fS]). It guarantees that, if the number of separators 
kept in the structure is 0(m), then in 0(n) time, we can check whether each 
newly encountered separator is already in the structure, and if not, insert it in 
the structure. 

Let us look in more detail into how the LB-simpliciality testing of an edge 
e = ab can be done: 



foreach x G N{e) do 

if X sees only a then l{x) = 1; 
if X sees only b then l{x) = 2; 
else l{x) = 3; 

foreach S C N{e) do 

if S is not yet in S(G) then 
insert S in S(G); 

compute the set of connected components of G{S) and insert them in 
_ the data structure; 

foreach component Sj of G(S') do 

if 3{x^y} S Sj I l{x) = 1 and l{y) = 2 then 
|_ return(e is not LB-simplicial) ; 

return(e is LB-simplicial); 



The first foreach loop requires 0{n) time. For the second foreach loop, 
because a weakly triangulated graph has at most n -I- m minimal separators, the 
outer loop must be terminated if the number of minimal separators stocked in 
the data structure exceeds n + m. In this case, we can readily conclude that the 
given graph is not weakly triangulated. In any case, each outer loop has at most 
0(n) iterations. 

Assuming the above mentioned restriction, processing each minimal separa- 
tor S requires: 

1. — If 5" is not in S(G): a search and an insertion: 0(n) per separator, and 

then the computation of the set of co-connected components: 0(m) per 
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separator. These operations are only done once per separator, which en- 
sures a global complexity of 0{m?) for this. 

— If S' is in S(G): a search and retrieval of the set of co-connected com- 
ponents from the data structure: 0{n). These operations may have to 
be done several times for each separator, thus this takes 0{n?) for each 
edge since there are at most 0{n) separators in N{e) (or, equivalently, 
0{n) iterations in the outer loop). 

2. In either case we must check whether edge e is S-saturating: 0(T'|S|,S C 
N{e)), i.e. 0{m) for each edge e. 

The global time complexity is thus 0{n^m), since there are m steps in Algo- 
rithm LB-WT each corresponding to an edge of the given graph. Note that only 
the second part of Case 1 mentioned above requires O(n^) time for each edge, 
which leads us to conjecture that an amortized complexity analysis would yield 
a global 0{m?) time complexity. 

5 Conclusion 

We have shown new structural properties for the class of weakly triangulated 
graphs, and we have established a strong relationship between this class and 
triangulated graphs. Based on this novel insight, we have introduced a new 
recognition algorithm for weakly triangulated graphs, which is easy to follow and 
understand, and which does not use any of the previously introduced techniques 
for recognition. Though we have not improved the recent current recognition 
complexity, our algorithm represents a new step towards a better understanding 
of this class. In addition, our algorithm possesses a great potential for parallel 
implementations . 

We leave open the question of computing a “weak triangulation” of an arbi- 
trary graph, which would help generating weakly triangulated graphs arbitrarily. 
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Abstract. We study web caching when the input sequence is a depth 
hrst search traversal of some tree. There are at least two good motiva- 
tions for investigating tree traversal as a search technique on the WWW: 

First, empirical studies of people browsing and searching the WWW have 
shown that user access patterns commonly are nearly depth first traver- 
sals of some tree. Secondly, (as we will show in this paper) the problem 
of visiting all the pages on some WWW site using anchor clicks (clicks 
on links) and back button clicks — by far the two most common user 
actions — reduces to the problem of how to best cache a tree traversal 
sequence (up to constant factors). 

We show that for tree traversal sequences the optimal offline strategy 
can be computed efficiently. In the bit model, where the access time of a 
page is proportional to its size, we show that the online algorithm LRU is 
(1-1- i)-competitive against an adversary with unbounded cache as long as 
LRU has a cache of size at least (1 -I- e) times the size of the largest item 
in the input sequence. In the general model, where pages have arbitrary 
access times and sizes, we show that in order to be constant competitive, 
any online algorithm needs a cache large enough to store 17(logn) pages; 
here n is the number of distinct pages in the input sequence. We provide 
a matching upper bound by showing that the online algorithm Landlord 
is constant competitive against an adversary with an unbounded cache 

* Supported in part by NSF Grant GGR-9734927 and by ASOSR grant F49620010011 

** Supported by the START program Y43-MAT of the Austrian Ministry of Science. 

* * * Supported in part by NSF Grant GGR-9734927 and by ASOSR grant F49620010011 
t Supported by the START program Y43-MAT of the Austrian Ministry of Science. 

,M. Halldorsson (Ed.): SWAT 2000, LNCS 1851, pp. ISO-EHl 2000. 

Springer- Verlag Berlin Heidelberg 2000 



Caching for Web Searching 151 



if Landlord has a cache large enough to store the 17(logn) largest pages. 
This is further theoretical evidence that Landlord is the “right” algorithm 
for web caching. 



1 Introduction 

1.1 Problem Statement and Motivation 

Web caching is the temporary local storage of WWW pages by a browser for 
later retrieval. From the user’s point of view, the primary benefit of caching 
is reduced latency, as the time to access locally stored objects is minimal. We 
adopt the following standard general model of web caching 

Web Caching Problem Statement: The browser is given an online sequence 
S of page requests, where each page Pi G S has a size s{i) (say, in bytes) and 
an access time t{i) that is required if pi is not cached. If the requested page 
Pi is not in the cache (this is called a cache miss), then the time to access pi 
is t{i). Otherwise, if Pi is in the cache (this is called a cache hit), then pi may 
be accessed instantaneously. After the request of page pi, but before the next 
request, the algorithm may evict/decache any arbitrary collection of pages and 
put Pi in its cache. At no time can the aggregate sizes of the pages in cache 
exceed the fixed cache size k. The objective function is to minimize the total 
access time. 

Note that we adopt the non-forced caching model here, that is, the algorithm 
need not cache an accessed page. The differences between the results for forced 
caching and for non-forced caching models are negligible. 

All of the previous work Web Caching that we are aware of assumes that 
the sequence S may be arbitrary. In this paper we consider the case that S is 
a depth first traversal of some tree T of pages. (Note that this restriction on 
the input allows us to obtain results that are stronger in a fundamental way.) 
We are motivated to consider this problem for two reasons. The first reason is 
that empirical studies of people browsing and searching the WWW have shown 
that user access patterns are commonly nearly depth first tree traversals mm- 
That is, people tend to visit new pages via an anchor click (more than 50% of 
user actions are anchor clicks ^), and to revisit pages using the back button 
(more than 40% of user actions are back button clicks ^). No other action 
accounts for more than 2% of users’ actions P]. Secondly, we show that the 
problem of visiting all the pages on some WWW site using anchor clicks and 
the back button essentially reduces to the problem of how to best cache a tree 
traversal sequence. This may be viewed as providing theoretical justification for 
tree traversal as a search technique on the WWW. 

Site Search Problem Statement: Informally, the searcher starts at the home 
page ph (say for example, www.microsoft.com) of some WWW site (say Mi- 
crosoft’s WWW site) with unknown topology. The searcher’s goal is to visit 
every page reachable from the home page using anchors and the back button. 
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More formally, an online algorithm starts at some node in an initially unknown 
directed graph G. Each node in G is a page Pi with size s(i) and access time 
t{i). We assume that every node in G is reachable from the start page. When the 
online algorithm visits a node pi it learns s(i), t(i), and the names of each page 
Pj such that (pi,pj) is a directed edge in G. If pi is not in the cache then the 
online algorithm must pay t(i), otherwise the online algorithm pays nothing for 
this visit. After visiting pi, the algorithm may decache any arbitrary collection 
of pages and put pi in its cache. At no time may the aggregate sizes of the pages 
in the cache exceed k. After making its caching decision the online algorithm 
may make one of two moves. First, it can push pi onto a stack S, and then move 
to a page pj with the property that (pi,pj) is an edge in G. Second, it can pop 
the top page pj off of S and return to pj . The online algorithm must visit every 
page and return to the the initial home page ph- (Note that the requirement that 
the online algorithm return to the initial page is for convenience. Dropping this 
requirement will only change the competitive ratio by at most a factor of two.) 
The objective function is to minimize the aggregate access time of those visits 
where the visited page was not cached at the time of the visit. 

Thus Site Search requires that the online algorithm must specify both a 
search strategy and a caching strategy. We show that, without loss of generality, 
online algorithms may restrict themselves to search strategies that traverse trees. 
That is, we show that the competitive ratio of every online algorithm A for 
the Site Search Problem is 0 (maxT^Tn ’ where 7n is the collection of all 

directed rooted trees on n nodes with edges directed away from the root, A{T) 
is the total access time for algorithm A on the tree T assuming that it starts at 
the root of T, and t{T) = aggregate access times of the nodes 

in T. 

Note that there are some differences between Site Search on trees and Web 
Caching on depth first tree traversal sequences. The online algorithm in Site 
Search may decide how it will traverse the tree T (this traversal need not be a 
depth first search traversal), while the online algorithm for Web Caching does 
not have this power. In Site Search, the online algorithm learns the degree of a 
node when it visits that node, which is not the case in Web Caching on depth 
first tree traversals. And most importantly, the competitive ratio for an online 
algorithm A for Site Search on a tree compares A{T) against the aggregate 
access times t{T) of the pages in T, while for Web Caching on depth first tree 
traversal sequences S of T, the competitive ratio compares A(S) against the 
optimal offline cost. We will show that the optimal offline cost may be much 
higher than t(T). 

There are three special cases of the caching models that have been studied 
previously msi. In the bit model the access time is assumed to equal the size of 
the page. This model would be appropriate if the pages are large and the delay 
in the network is small. In the cost model the size of each page is one, while 
the access times are allowed to be arbitrary. This is an appropriate model if the 
page sizes are roughly equal. For our purposes in this paper, the cost model is 
really no easier for online algorithms than the general model. In the fault model 
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the access time for each page is constant, while page sizes may be arbitrary. 
The fault model will not interest us here since it is obvious that every online 
algorithm without a cache is constant competitive against an adversary with an 
unbounded cache in the fault model for Web Caching on depth first tree traversal 
sequences. 



1.2 Our Results 

Our results differ from prior work on caching in a fundamental way. In particular, 
we bound the size of cache required by an online algorithm in order to be constant 
competitive against an offline optimal algorithm that uses an unbounded amount 
of cache. 

In section El we give the following foundational results. We show that the 
competitive ratio of any online algorithm for Site Search is O ^max^gT-^ 

We give a pseudo-polynomial time offline dynamic programming algorithm to 
compute Opt(S') when S' is a depth first tree traversal. This stands in contrast 
to offline Web Caching for general sequences, where no pseudo-polynomial time 
algorithm is known |P. 

In section 0 we investigate Site Search and Web Caching under the bit model. 
For Web Caching, we show that the online algorithm LRU is (1-1- i)-competitive 
against an adversary with unbounded cache as long as LRU has a cache of size at 
least (1-1- e)L, where L is the size of the largest item in the input sequence. Note 
that an algorithm with unbounded cache only has to pay to access each item 
once; so another way to state this result is that the total access time for LRU 
is at most (1 -I- 7 ) times the aggregate access times of the pages regardless of 
how often these pages are accessed. Similarly, for Site Search we show that the 
online algorithm that uses a depth first traversal and LRU is (1-1- i)-competitive 
against an adversary with an unbounded cache as long as LRU has a cache of 
size at least (1 -I- e)L. 

In section 0 we give lower bounds on the competitive ratios for Web Caching 
and Site Search in the cost model (obviously these also hold in the general 
model). We first show a lower bound of f2 (min(fc, on the competitive 

ratio of any deterministic online algorithm for Web Caching. We then show a 
lower bound of 17 ^max(i, on the competitive ratio of any deter- 

ministic online algorithm for Site Search. We accomplish this by showing that 
maxter„ 17 (^max , where OPTfe(T) is the opti- 

mal offline cost for Site Search on T. Thus these results show that for both Web 
Caching and Site Search, an online algorithm needs a logarithmically sized cache 
to be constant competitive. 

In section 0 we analyze the online algorithm Landlord (this algorithm is a 
generalization of LRU and is also called Greedy-Dual-Size in the literature) 0, 
E], M Although we will state all results in the cost model, the results hold for 
the general model if k is replaced by We show that Landlord is 

O ^min -competitive for Web Caching on depth first tree 
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traversal sequences. We also show that the online algorithm that uses a depth 
first traversal and Landlord is O ^min {k, ^-competitive for Site 

Search. The proper way to interpret this result is that for both Site Search and 
for Web Caching on tree sequences, Landlord is constant competitive against 
an adversary with an unbounded cache if Landlord has a cache large enough 
to hold at least logn pages. That is, a multiplicative increase in the number 
of pages only requires an additive increase in cache size to remain competitive 
against an adversary with infinite cache. Yet another way to interpret this result 
is that the number of pages that an adversary has to use to really fool Landlord 
is exponential in the cache size. 

To date. Landlord appears to be the theoretical champion for Web Caching 
on arbitrary sequences . Sectional and section0 together show that even if 

Landlord is not the theoretical champion for depth first tree traversal sequences, 
then at least it is not far away from being the champion. That is, even if one 
was going to design an online algorithm specifically for depth first tree traversal 
sequences, one could not do a whole lot better than Landlord. We take this 
as further theoretical evidence that Landlord is the “right” algorithm for Web 
Caching. 

In section 0 we introduce a new algorithm Slumlord for Web Caching on 
depth first tree traversal sequences. Slumlord is a variation of the algorithm 
Landlord in that instead of raising the rent on every page in the cache. Slumlord 
in some sense only raises the rent on the one page that can least afford to pay 
(this uses the rent analogy from 1 1 4j). Thus Slumlord is a much more conservative 
algorithm than Landlord as it will wait longer to evict pages. We show that all 
of the analysis results on Landlord from section 0 also hold for Slumlord. Our 
purpose for introducing Slumlord is that we have some reason to suspect that in 
some cases, i.e. for some particular relationships between k and n, Slumlord may 
perform slightly better than Landlord on depth first tree traversal sequences. 

Notice that LRU is what we will call an oblivious algorithm, in that it ignores 
the access times of the pages. In section^ we consider oblivious algorithms in the 
cost model. We show that the online algorithm Least Frequently Evicted (LFE) 
is optimally competitive among oblivious online algorithms for Web Caching on 
tree traversal sequences. Furthermore, for Site Search we show that the online 
algorithm that uses a depth first traversal and LFE is strongly competitive if 
k = 0(1). An online algorithm A is strongly competitive for a problem V if the 
competitive ratio of A is at most a constant factor worse than the competitive 
ratio of any other online algorithm for V. This is in contrast to Site Search in the 
cost model with fc = aj(l), and to Web Caching in the cost model over all ranges 
of fc, where we show that there are no strongly competitive oblivious algorithms. 

Due to space limitations, most of the proofs could not be included in this 
version of the paper. These proofs may be found in the full version of the paper, 
on the third author’s home page. 
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1.3 Related Results 

We first discuss known results for offline Web Caching. It is easy to see that Web 
Caching in the bit model (and in the general model) is NP-hard in the ordinary 
sense. In ^ polynomial-time offline 0(log fc) -approximation algorithms are given 
for the bit model and for the fault model. In P a polynomial-time offline 0(1)- 
approximation is given, provided that the polynomial time algorithm is given 
additional 0(L) cache, where L is the size of the largest page. Additionally in 

a polynomial time offline 0(log(fc -I- L))-approximation algorithm is derived. 

Next, let us consider online Web Caching. The algorithm Greedy-Dual-Size 
is introduced in [^, where it is shown to be /c-competitive. Greedy-Dual-Size is 
a generalization of the algorithm Greedy-Dual in P3| that is specific for the cost 
model. In P! it is shown that Greedy-Dual-Size (this paper introduces the name 
Landlord for this algorithm) is -competitive against an adversary with 

cache size h assuming forced caching. In H3 it is also shown that in some sense for 
most choices of k, the retrieval cost is either insignificant or the competitive ratio 
is constant. In 0 it is shown, using linear programming duality, that Greedy- 
Dual-Size is -competitive against an adversary with cache size h assuming 

non-forced caching. In 0 online randomized O (log^ A:) -competitive algorithms 
are given for the bit and fault models . 

Previous researchers have theoretically studied the caching problem with 
uniform times and uniform sizes under particular input patterns. In 0 (and in 
several follow-up papers) the input is assumed to be a walk in a graph, and in 
im the input is assumed to be the output of a Markov chain. In jOj it is shown 
that if the input sequence is a depth first traversal of a tree then LRU will have 
2n — k cache misses, and that LRU always performs better than Most Recently 
Used on depth first traversal sequences. 

In 0 the direct-mapped caching problem was studied with sequential access 
sequences. Perhaps the most closely related result to the search part of Site 
Search is in cn); Recasting the results from a geometric setting to the Site Search 
setting, it is shown in m that there is an online algorithm that is constant 
competitive if A: = 0, G is planar, and the edge relation in G is symmetric. 

2 Foundational Results 

Theorem 1. For Site Search in any model (general, cost, or hit), the competitive 
ratio of every deterministic online algorithm A is 0 ^max^gT-^ Tj^)- 

Proof. The competitive ratio is at most maxT’gT-^ RTJ'’ si'^ce the online searcher 
may perform a depth first traversal of the site and the offline searcher has to 
access every page at least once. 

To see why the competitive ratio is at least | max^gT-^ ^ 

arbitrary directed rooted tree on n nodes with all edges directed away from the 
root. Let be the last page in T visited by A. Greate a directed graph G that 
includes each directed edge in T and directed edges going from to every other 
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node in T. Then ^’s actions on G are identical to A’s actions on T until is 
visited. From A may return directly to the root; hence, A{G) > ^A{T). The 
offline adversary may visit all of G incurring cost at most 2t(T) by traversing 
the shortest path from the root to then visiting each remaining unvisited 
node in a hub and spoke pattern from p„, and then backing up to the root. 

For an instance T G 7^ of Site Search, we define OPTfc(T) to be a minimum 
access time strategy for visiting all the nodes in T and returning to the root of 
T assuming that the cache size is k. We show for Site Search on trees that the 
optimal offline algorithm may use any depth first search that it likes. Note that 
this in no way implies that the optimally competitive online algorithm uses 
depth first search. 

Lemma 1. For every depth first traversal S of a tree T, there is an optimal Site 
Search strategy that uses S to traverse the tree T. 



Lemma 2. For Weh Caching on depth first tree traversals in the cost model, 
and for Site Search in the cost model, OvTk{T) /t{T) < + 1 holds for any 

tree T with n nodes. 

We now give an optimal offline algorithm for Site Search on trees and for 
Web Caching on depth first tree traversal sequences in the general model. Let 
Pr be a node in T with children p^^, . . . ,Pcra - If s(r) > k then obviously 



m 

OPTk{Tr) = t{r) + [OPTfc(TcJ + t{r)] 

i=l 

So now consider the case that s{r) < k. We say that Pa is cheap if OPTfc_ 5 (^) (Tc^ ) 
— OPTfe(TcJ < t{r), and otherwise we say that is expensive. It easy to see 
that one should cache Pr before visiting a cheap child p^^ since the time savings 
from having additional s(r) cache is less than the access time for pr- Similarly, 
one should not cache Pr before visiting an expensive child p^ since one can reap 
a time savings of more than t{pr) by having additional s(r) cache during the 
traversal of Tc^. Hence, 

OPTfc(Tr) = t{r)+ Y OPTfe_,(^)(TeJ + Y [OPTfc(TeJ +t(r))] 

cheap Pc expensive 

The obvious dynamic programming implementation of this recurrence runs in 
time 0{kn). Summarizing, this dynamic program yields a pseudo-polynomial 
time algorithm for Web Caching of tree traversal input sequences in the bit 
model and in the general model, and a polynomial time algorithm for the cost 
model. 
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3 Bit Model 

The algorithm Least Recently Used (LRU) evicts the least recently used items 
until there is room to fit the most recently requested item in the cache. We show 
that in the bit model the online algorithm LRU is (1 + -)-competitive against 
an adversary with unbounded cache as long as LRU has a cache of size at least 
(1 + e)L. Recall L is the size of the largest item in the input sequence. 

Theorem 2. Suppose 0 < e < 1 and that that LRU is equipped with a cache 
of size fc > (1 + e)L. Then for Web Caching in the hit model where the input 
sequence S is a depth first traversal of some tree T, the algorithm LRU guarantees 
that^^<l+\. 

Proof. We split the cost of LRU into the cost incurred while moving downwards 
(from a parent down to a child) and the cost incurred while moving upwards 
(from a child up to its parent). We show by an amortization argument, that 
the total cost for upward moves is at most There is an account Acc^ 
associated with each page pi, and there is an account Acc(LRU) for LRU. 
Initially, Acc^ = ^ for each page Pi and Acc(LRU) = 0. When a page Pi is 
requested in a downward move, all accounts remain unchanged. When a node 
Pi is requested in an upward move and pi is not cached, then t{i) is deducted 
from Acc(LRU). If the request sequence is next going to visit another child of 
Pi, then all the funds in Acc(LRU) are moved to Acc^, and LRU enters this 
subtree with an empty account. Otherwise, if the request sequence returns to 
Pi’s parent, then all the funds in Acc^ are transferred to Acc(LRU). 

Our first goal is to show that during an upward move from a node pi to- 
wards its parent, Acc(LRU) > min(^£^, L) always holds. The proof is done by 
induction. The base case is if pi is a leaf. In this case the account of pi with 
value ^ has just been transferred to LRU, and thus Acc(LRU) > ^ 
holds. Next assume that the claim holds for each of the children pcu • ■ • 
of Pi- We break the proof into two cases: (Case 1) First, assume that for all j, 
1 < j < m, t(Tcj ) < eL holds. Then every Tc^ can be traversed without evicting 
Pi, and Pi will be kept cached throughout the traversal of T^. Since no charges 
are deducted from the searchers account at pi, the inductive claim yields that 
Acc(LRU) > ^ -I- holds at the moment when LRU leaves 

Pi upwards to its parent. (Case 2) Now assume that there exists a j, 1 < j < m, 
with t{Tcj ) > cL and consider the moment in time when the searcher returns 
from Pc up to Pi- At this moment, Pi need not be in the cache. By induction, the 

t{Tc) 

value of Acc(LRU) is at least L = min( — ^,L). Hence, after (possibly) paying 
the charge for visiting pi, Acc(LRU) > L — t{i) holds. If the request sequence 
now returns to pfs parent, then Acc(LRU) > {L — t{i)) + ^ > L (where the 
second term is the original amount in AcCj). Otherwise, if the request sequence 
moves on to the next child of Pi, then Acc(LRU) > L — t{i) will be added to 
AcCi and so AcCj > {L — t{i)) + ^ > L and this amount will eventually be 
transferred to Acc(LRU) before it moves up to pfs parent. 
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Next, we argue that Acc(LRU) is never negative, and that therefore LRU 
can always pay for the revisits. Consider visiting a parent pi from a child Pc- 
If t{Tc) < eL then pi is still cached at this moment and no charge is taken. 
Otherwise, if t{Tc) > eL then Acc(LRU) > L and LRU can afford the charge 
since t{i) < L by the definition of L. Summarizing, the account of LRU stays 
non-negative throughout the traversal of the tree. Since the total amount of 
funds available in the beginning is and since LRU is able to finance all its 
upward moves from these funds, the total incurred cost indeed is at most 
Since the total cost of LRU for downward moves equals t(T), the proof of the 
theorem is complete. 

This corollary is an immediate consequence of theorem ^ and theorem |21 

Corollary 1. For Site Search in the bit model, the algorithm that uses LRU and 
a depth first traversal guarantees that < 1 + 7 ■ 

It is easy to see that the above bounds are tight for e = 1 by considering 
trees where each internal node has one child, and all pages have access time L. 
Note that in Site Search that this is not merely an artifice of our requirement 
that the searcher return to the root as you could always enforce this condition 
by adding a second leaf-child of the root. 



4 Lower Bounds in the General Model 

We show that in the cost model (and hence also in the general model), every 
online algorithm for Web Caching and every online algorithm for Site Search 
requires a cache of size l7(logn) in order to be constant competitive. 

Theorem 3. For Web Caching in the cost model, any deterministic online al- 
gorithm A fulfills the following statements. 

(i) Let k and n be integers such that k 1 < Ign. Then there exists a tree T 

with 0{n) nodes on which A is C (min(fc -|- 1, -competitive. 

(ii) Let k and n be integers such that fc -|- 1 < Ign. Then there exists a tree T 

with 0{n) nodes such that A{T)/t{T) > 

Proof. The adversary constructs a tree T with k-\-2 levels numbered 0, 1, ... , fc-l- 
1. Level 0 only contains the root of T, level 1 contains all the children of the root, 
and so on. Every page at level i has access time where x = 

Note that a: > 1 since k -\- 1 <\gn. Hence, every node has access time at least 
one. The exact shape of T is determined by the adversary in dependence on the 
behavior of the online algorithm A. The adversary follows a simple Hit- Where- 
It-Hurts strategy. Let p be the last requested page, and let £ be the level that 
contains p. 
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Expand: If £ < fc + 1, then the adversary creates a path of fc + 1 — £ 
new pages at levels £ + 1, . . . , fc + 1 that are descendents of page p. The 
pages on this path are then requested one by one. 

Hit: Otherwise, £ = k + 1 holds. The adversary requests the ancestors 
of p until it reaches a page that is currently not cached by the online 
algorithm. 

The adversary alternates between expansions and hits until it has created n 
nodes (if this happens in the middle of an expansion or hit, this move is still 
completed and then the process stops). Clearly, the thus created tree T has 0{n) 
nodes. By n^, 0 < £ < fc + 1, we denote the total number of nodes at the £-th 
level of tree T. Note that uq = 1. 

Now let p be a page at level £ < k with m children. Since all leaves of T are 
at level fc + 1, m > 1 holds. When the online algorithm pays for accessing p then 
either the adversary is expanding the tree (and p is created) or the adversary 
is hitting (and the request sequence returns from one of the m children) . When 
the request sequence returns from one of the first m — 1 children, the adversary 
just has done a hit. The online algorithm pays for accessing p, and then the next 
child is created in the following expansion. When the request sequence returns 
from the last child, it immediately moves on to the parent of p and we are in 
the middle of some hit. Altogether, for accessing page p, the algorithm A pays 
m times the size of p, and for all the accesses to all the pages in level £, it pays 
the total number of their children times their access time For the 

pages in level k + 1, A altogether pays Uk+i times access time 1. Summarizing, 
this yields 



k 

A{T) = nfe+i + ^n^+icr'=+i-^ > xt{T)-x^+^ = x{t{T)-n/2^+^) > 
e=o 

( 1 ) 

In the last inequality, we used that t{T) > n. This inequality holds since every 
node has access time at least one. 

One possible offline strategy always keeps all the predecessors of the currently 
requested page in cache, with the exception of the pages at some fixed level A 
with 0 < X < k. Since T has only k + 2 levels and since there is no need to cache 
the pages at level k+1, this strategy can always be carried out with a cache of size 
k. This offline strategy has to pay for accessing a page (a) if the page is requested 
for the first time, or (b) if the page is at level A and if the request sequence moves 
from a page at level A + 1 up to level A. The total cost for (a) is t{T), and the 
total cost for (b) is Hence, Opt(T) < t(T)+min^^Q{nA+ia:^'''^“'^}, 

and a simple averaging argument yields 

k 

Opt(T) < + < tiT) + ^x-t{T). (2) 



By combining (P) and ((2D, we conclude that the competitive ratio of A is at least 



A{T)/Opt{T) > 



{k + l)a: • t{T) 
2{k+l + x)t(T) 



{k + l)x 
2{k 1 “t“ x) 



(3) 
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Now let us prove statement (i) of the theorem. If k+1 < then fc+1 < 2x 

and we derive from that 



A(T)/Opt(T) > 



{k + l)a; 
2(/c -t“ 1 -t“ x) 



> 



{k + l)x 
6x 



1 

6 



(fc + 1). 



If on the other hand fc + 1 > holds, then fc + 1 > x and we derive in a 

similar way that 



A(T)/Opt(T) > 



(fc + l)x 
2(fc + 1 + x) 



> 



(fc + l)x 
4(fc+ 1) 



8 



This proves (i). Finally, statement (ii) follows from the inequality in and 
from X = With this, the proof of the theorem is complete. 

Now we turn to the Site Search problem. We know from theorem E that 
without loss of generality (and up to constant factors) we only need to consider 
online algorithms A that traverse some subtree T of G. However, we do not 
necessarily know that A performs a depth first traversal of T. To get around 
this difficulty, we consider the following Modified Site Search problem. It is easy 
to see that a lower bound on the competitive ratio for any online algorithm for 
Modified Site Search also yields a lower bound on the competitive ratio for any 
online algorithm for the original Site Search problem. 

Modified Site Search Problem: The online algorithm is told that the topol- 
ogy of G consists of a directed tree T, rooted at the initial page with the edges 
directed away from the initial page, and edges directed from a secret page Ps 
to every other page. The online algorithm is told T a priori, but is not told the 
identity of the secret node Ps (and actually, the adversary will make Ps the last 
node that the online algorithm visits). The goal of the online algorithm is still 
to visit all the nodes and to return to the initial page. 

Recall that OPXfc(T) is the minimum access time strategy, with cache size 
fc, for visiting all the nodes in T and returning to the root of T. Also re- 
call that by lemma ^ we may assume that OPTfe(T) uses a depth first search. 
Note that OPTfc(T) can be computed by the online algorithm before it begins 
its traversal. The competitive ratio of any online algorithm for Modified Site 
Search is then f2 ^maxy ^ • We show that maxy is at least 



Theorem 4. For Modified Site Search, the competitive ratio of every determin- 
istic online algorithm A is at least fl ^max • 



Corollary 2. For Site Search in the cost model, the competitive ratio of every 
deterministic online algorithm A is at least f2 (max j j . 
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5 Analysis of Landlord 

We show that for Web Caching on depth first tree traversal sequences and for Site 
Search, Landlord is constant competitive against an adversary with unbounded 
cache if Landlord has a cache large enough to hold at least log n pages. 

Landlord Description: jS] The algorithm maintains a non-negative credit c{i) 
for each page pi in the cache. Given a request for pi, if pi is in the cache the 
algorithm resets c{i) to t{i). Otherwise, the algorithm sets c{i) = t{i) and “pre- 
tends” Pi is in the cache. Then it repeats the following eviction step while the 
total size of the items in the cache exceeds k. 

Eviction step: Let Pm be a page in the cache that minimizes the ratio and 
let 5 = ^0^- For every pi in the cache, the algorithm decreases c{i) by 5s{i), 
and then evicts Pm- 

Proposition 1. MM For Web Caching in the general model, Landlord is 
-competitive against an adversary with a cache of size h< k. 

Theorem 5. For Site Search in the cost model, the online algorithm that uses 
depth first search for traversing and Landlord for caching is 



fc < i log n, 

the cost model. Landlord is 
< ^logn caches, and 0{l)-competi- 

tive for k> ^ log n eaches. 

6 Slumlord 

Slumlord Description: The algorithm is identical to Landlord, except for two 
changes. Firstly, a page p is decached if the user hits the back button from p. 
Secondly the eviction step is different. 

Eviction step: Let pi be currently requested page and let pj be the cache page 
(other than pi) that was most recently requested. Let S = min ■ The 

algorithm decreases c(i) by <5s(z) and decreases c{j) by Ss{j). The algorithm then 
evicts one of Pi and pj with zero credits. 

We call the algorithm Slumlord for the following reason. In m the decre- 
menting of the credits was thought of as being analogous to raising rents. In 
the worst case trees the nodes lower in the trees have lower access times. So 
Slumlord only raises the rent on those nodes lowest in tree, which are also the 
ones that can least afford to pay. 



(i) O (jtn'=+i^-eompetitive if k < \/log n, 

(ii) O 0^nT^^-eompetitive if \/log n < 
(Hi) 0{1)~ competitive if k > ^logn. 

Theorem 6. For Web Caehing in 
O (min ( k, n'^ ] ] -eompetitive for k 
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Theorem 7. For Weh Caching, Slumlord is -competitive against an ad- 

versary with a cache of size h < k. 

Theorem 8. For Site Search in the cost model, the online algorithm that uses a 
depth first traversal and Slumlord for caching is O ^min -competi- 

tive. 

Theorem 9. For Web Caching in the cost model. Slumlord is 
O ^min (jt, -competitive. 

7 Oblivious Algorithms for the Cost Model 

We show that Least Frequently Evicted (LFE) is essentially the best oblivious 
algorithm and that the algorithm for Site Search that uses depth first and LFE 
is strongly competitive if k = 0 ( 1 ). Recall that an oblivious algorithm is one 
that ignores access times. 

LFE Description: An eviction count e{pi) is maintained for each page pi. 
Initially each e{pi) = 0 . Assume that the page Pr has just been requested at 
time u. If this is not the first time that Pr was requested, the page Pc requested 
at time m — 1 is decached if Pc is in the cache (note that Pc is a child of Pr). As a 
consequence of this, LFE maintains the invariant that all the pages that are in 
the cache are on the path from the root to the last requested page. If the cache 
is not full just before Pr was requested then pr is added to the cache. If the cache 
was full before Pr was requested, then LFE pretends that Pr is in the cache and 
selects a pi in the cache that minimizes e{pi); in case of a tie, pi is selected to 
the the page closest to the root. Note that it may be the case that i = r. The 
selected page pi is then evicted and e{pi) is incremented. 

For fixed n and k, let 7 = 7(n, k) be the smallest integer that satisfies 
Observe that (fc + 1)(^)'^+^ < < (^ + 1) 

C^O+^^) )fc+i^ For fc < I logn, we have 7 = 6>(A:n^+T). We will call a page fat if 
it has at least 7 children, and otherwise we call the page skinny. Define a{k,€) 
as the minimum over all trees of the number of distinct fat pages that must be 
requested before LFE, with a cache of size k, causes the eviction count of some 
page to reach 7 + 1 '. We now state some preliminary lemmas that are necessary 
for the analysis of LFE. 

Lemma 3. //I<f'<7+1 then a{k, £) > . 

Theorem 10. For Web Caching in the cost model, LFE is (2^ -\-2) -competitive, 
and hence, 0{knT^)- competitive. 

Corollary 3. For Site Search in the cost model, the algorithm that uses LFE 
and depth first search is 0{kn>^)- competitive. 
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We now show that every oblivious online algorithm for Web Caching has 
competitive Q(kn '“+1 )-competitive. Since any page could have nonzero access 
time while all other pages have zero access time, an 7-competitive oblivious 
algorithm cannot miss any page more than 7 times. 

Theorem 11. For Weh Caching in the cost model, every deterministic oblivious 
online algorithm is 0 {kn'^)- competitive. 
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Abstract. We consider the on-line problem of scheduling jobs with 
precedence constraints on m machines. We concentrate in two models, 
the model of uniformly related machines and the model of restricted 
assignment. For the related machines model, we show a lower bound 
of Q{y/m) for deterministic and randomized on-line algorithms, with or 
without preemptions even for jobs of known durations. This matches 
the deterministic upper bound of 0{^/m) given by Jaffe for task sys- 
tems. The lower bound should be contrasted with the known bounds 
for jobs without precedence constraints. Specifically, without precedence 
constraints, if we allow preemptions then the competitive ratio becomes 
6>(logm), and if the durations of the jobs are known then there are 0(1) 
competitive (preemptive and non-preemptive) algorithms. 

We also consider the restricted assignment model. For the model with 
consistent precedence constraints, we give a (randomized) lower bound of 
f2(logm) with or without preemptions. We show that the (deterministic) 
greedy algorithm (no preemptions used), is optimal for this model i.e. 
O(logm) competitive. However, for general precedence constraints, we 
show a lower bound of m which is easily matched by a greedy algorithm. 



1 Introduction 

We consider the on-line problem of scheduling a sequence of jobs with prece- 
dence constraints on m parallel machines. A job can be scheduled after all its 
predecessors are completed. In the simplest model, the identical machines model, 
each job j has a running time Wj , and has to be scheduled on a machine for this 
period of time. 

In the related machines model each machine i has a speed vi. Each job may 
be processed on any machine and the time to process a job j with a running time 
Wj on i would be Wj /vi . In the restricted assignment model all machines have 
identical speed, but each job may be assigned only to a subset of the machines. 
For a job j, we denote by M(j) C m} (M{j) ^ 0) the subset of machines 

on which it may be scheduled and by Wj its running time on a machine is M{j). 
The unrelated machines model is a generalization of all previous models. In this 
model, each job j has a vector of m components, where for each i component i 
gives its running time on machine i. 
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We may or may not allow preemptions. If no preemptions are allowed, once a 
job is scheduled on a machine, it must be processed on this machine continuously 
until it is completed. Otherwise, if we allow preemption, a job may be stopped, 
and resumed later on some (maybe different) machine. 

The precedence constraints between jobs can be viewed as a directed graph 
G. The vertices of G are the jobs. An edge (ji, J2) occurs when ji is a predecessor 
of j2, i.e. j2 may start its process only after j\ is completed. For restricted as- 
signment the precedence constraints are called consistent if for every edge (ji, J2) 
we have M(j2) C M{ji). The motivation for consistent precedence constraints 
comes from the fact that if a job ji requires some expertise which are known 
only to some machines and ji is a predecessor for another job j2, then j2 should 
require at least the same expertise and hence can be processed only on subset 
of machines that ji can be processed on. 

We discuss an on-line environment in which a job becomes known as soon 
as all its predecessors are completed (there are no realize times). The goal is to 
minimize the makespan which is the time that the last job is finished. We consider 
two variations of the on-line model. In the known duration case, the durations of 
a job is known upon its arrival, and in the unknown duration case, the duration 
of a job becomes known only when it departs. Our lower bounds hold even for 
the known duration case (and hence also for the unknown duration case) while 
the algorithms do not use the informations on the durations and therefore are 
valid for both cases. For a survey on on-line scheduling we refer the reader to 

m 

We measure the algorithms in terms of the competitive ratio. We compare 
the cost (makespan) of the on-line algorithm (denoted by Gon) to the cost of 
the optimal off-line algorithm that knows the sequence in advance (denoted 
by Gopt). The off-line algorithm knows all jobs and their properties (running 
time, precedence constraints and assignment restrictions) in advance. Note that 
the on-line algorithm is familiar with all properties of a job as soon as the job 
arrives (except for the running time, in the case of unknown durations), but a job 
arrives only after all its predecessors are completed. A deterministic algorithm 
is r competitive (has competitive ratio r) if Con < fCopt- If the algorithm is 
randomized, we use the expectation of the on-line cost instead of the cost and 
the competitive ratio is r if E{Con) < fCopt- 

Our results. For related machines we give a deterministic and randomized 
lower bound of fi{^/rn) on the competitive ratio of any on-line algorithm for 
jobs with precedence constraints. This matches the upper bound of Jaffe [Zj 
who gave an approximation algorithm which can be implemented in an on-line 
environment. In fact, Davis and Jaffe |3j already gave a lower bound of 
for the case with no precedence constraints which obviously holds for the case 
of precedence constraints. However, their lower bound is valid only for unknown 
durations and no preemption. If we allow preemption then Shmoys, Wein and 
Williamson [0| showed an upper bound of 0(log m) for the case of no precedence 
constraints. Moreover, if the durations are known then in principle one can get 
1 competitive algorithm for both preemptive and non-preemptive cases. This 
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follows from the fact that if there are no precedence constraints then it implies 
that all the jobs are known in advance and the problem becomes an off-line 
one. This should be contrasted with our result that implies that with precedence 
constraints one cannot get a better bound than 0{^/rn) even if the duration are 
known and the algorithms are preemptive. Moreover, our lower bound holds for 
randomized preemptive online algorithm versus deterministic non-preemptive 
adversary. 

For the restricted assignment model we consider the greedy algorithm which 
is an adaptation of the LIST algorithm of Graham m- This algorithm achieves 
a competitive ratio of 2 — 1/m for scheduling jobs with precedence constraints 
on identical machines. Epstein ^ shows that LIST is optimal for scheduling 
jobs with precedence constraints on identical machines, even if preemptions are 
allowed. Azar et al P] showed that for the case of no precedence constraints 
the greedy algorithm for scheduling jobs one by one in the restricted assign- 
ment model achieves a competitive ratio of O(logm). We show that if we allow 
consistent precedence constraints then the competitive ratio of the algorithm is 
still O(logm). We show that the algorithm is optimal in this case by giving a 
lower bound of i7(logm) on the competitive ratio of any deterministic or ran- 
domized algorithm for scheduling jobs with restricted assignment and consistent 
precedence constraints. We note that our lower bound does not follow from the 
lower bound of Q since here we do not insist on scheduling a job immediately 
upon its arrival. Our lower bound holds even for the known duration case and 
the upper bound does not use the durations. Moreover, the lower bound holds 
even for randomized preemptive algorithm versus deterministic non-preemptive 
adversary while the upper bound holds for non-preemptive algorithm versus 
preemptive adversary. Again, the precedence constraints are crucial for proving 
the lower bounds with known durations, since, otherwise, it becomes an off-line 
problems since all jobs are given at the beginning, and durations are known in 
advance. 

For general precedence constraints we show a lower bound of m for any online 
algorithms (J7(m) for randomized algorithms). This bound is easily matched by 
the greedy algorithm which is m competitive. This implies that the unrelated 
machines case is not of an interest since it is m competitive. 

The Greedy algorithm. We adapt the Greedy algorithm ’’List”, given 
by Graham for identical machines, to the case of restricted assignment as 
follows. Each time that a machine i becomes idle, assign to it a job j (if exists) 
such that i € M{j) and j has not been scheduled yet. Each time that a new job 
j arrives, assign it to an idle machine i G M{j) if exists. Note that Greedy is 
deterministic and does not use preemptions. 

Randomized algorithms. To prove lower bounds on the competitive ratio 
of randomized algorithms we use an adaptation of Yao’s theorem for on-line 
algorithms. It states that if there exists a probability distribution on the input 
sequences for a given problem such that E{Con/ Copt) > c for all deterministic on- 
line algorithms, then c is a lower bound on the competitive ratio of all randomized 
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algorithms for the problem, (see 0). We will use only sequences for which Copt 
is constant and thus in our case E{Con/Copt) = E{Con)/Copt- 

2 Scheduling on Related Machines 

Theorem 1. Any on-line algorithm for scheduling on related machines has the 
competitive ratio of at least This is true even for randomized preemptive 

algorithms versus deterministic non-preemptive adversary. 

Proof. We start by considering deterministic algorithms. We assume without loss 
of generality that m — 1 is a square, y/m — 1 = r. The set of machines consists 
of one fast machine of speed r = -y/m — 1 (machine m) and m — 1 slow machines 
of speed 1 (machines 1, . . . , m — 1). There are r phases of r -|- 1 unit jobs each in 
the sequence. The sequence begins with r -h 1 independent unit jobs (phase 1). 
Next we define phase i, 2 < i < r, the phase contains r -|- 1 units jobs. Let bi-i 
be the job in phase i—1 that finishes last by the on-line algorithm, then all jobs 
of phase i depend on bi-i. 




Fig. 1. A possible on-line assignment in the proof of Theorem ^ 



The on-line algorithm, by the definition of bi, can start scheduling phase i-\-l 
only after all jobs of phase i are completed. Since each phase consists of r -|- 1 
jobs, it is possible to use at most r -|- 1 machines at each time. The r -|- 1 fastest 
machines can process at most 2r unit jobs in one unit of time, and since the 
total running time of all jobs in one phase is r -|- 1, each phase takes at least 
(r -I- l)/(2r) >1/2 time units. Thus the total time to process all the sequence is 
at least r((r -|- l)/(2r)) = (r -|- l)/2 = Q{y/rn) (see Figure [Q. 

The optimal off-line algorithm assigns each bi to the fast machine at time 
(i — 1) /r, and thus the jobs of phase i-\-l may be assigned at time ijr to machines 
ir 1, . . . ,{i-\- l)r. The jobs of phase r would finish at time (r — l)/r -|- 1 < 2 on 
the slow machines. The fast machine would finish at time 1 and thus Copt < 2 
(see FigureEj). The competitive ratio is f2{y/m). 
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Fig. 2. A possible off-line assignment in the proof of Theorem Q 



To extend the proof for randomized algorithms, is chosen uniformly at 
random among all jobs of phase i. Clearly the optimal schedule remains the 
same. Next we evaluate the expected on-line schedule. The probability that the 
time starting from the arrival of phase i, till bi is completed is at least T would 
be {r + l — k)/{r +1) where k is the maximum number of jobs that it is possible 
to complete in time T. For T = (r -I- l)/(4r), it is possible to complete at most 
[(r -I- 1)/2J jobs and thus the expectation of the time that passes from the 
arrival of bi and till it is completed is at least (r -|- l)/(8r) > 1/8, and thus 
E{Con) = C(-y/m) and again the competitive ratio is f2{y/rn) as well. 

3 Restricted Assignment with Consistent Precedence 
Constraints 

In this section we consider consistent precedence constraints for the restricted 
assignment model. Recall that precedence constraints are called consistent if for 
every ji which is a predecessor of j 2 we have M{j 2 ) C M{ji). 

Theorem 2. Any on-line scheduling algorithm for the restricted assignment 
model with consistent precedence constraints has a competitive ratio of at least 
l7(logm). This is true even for randomized preemptive algorithms versus deter- 
ministic non-preemptive adversary. 

Proof. We assume without loss of generality that m is a power of 2, m = 2^. 
It is easy to extend the proof for general m. The sequence consists of mN jobs 
where N > 21og2 to = 2k, the jobs belong to A: -|- 1 phases, where for 1 < i < fc 
phase i contains m{N -\-2 — i) /2® unit jobs, and phase fc -I- 1 contains N — log 2 to 
unit jobs. The jobs of phase i are restricted to machines {1, . . . , 2^“*+^}. We 
define the dependencies according to the behavior of the on-line algorithm. Let 
bi the job that finishes last in phase 1, then all jobs of phase 2 depend on bi. 
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For i = 2, . . . ,k, let bi the job that finishes last from phase i, then all jobs of 
phase i -h 1 depend on bi. 

Since bi is the job that finishes last at phase i, and all jobs of phase i + 1 
depend on it, then no jobs of phase i + 1 are scheduled until all jobs of phase i 
are done. For 1 < i < fc, the jobs of phase i are restricted to 2^“*+^ = 
machines, thus the time to finish all jobs of phase i is at least {N+2—i ) /2 = f2{N) 
(even with preemptions). Since there are f?(logm) phases. Con = C{Nlogm) 
(see Figure E|). 
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Fig. 3. A possible on-line assignment in the proof of Theorem 
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Fig. 4. A possible optimal off-line assignment in the proof of Theorem El 
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The optimal off-line algorithm schedules all bi on the first machine, each bi 
is scheduled at time i — 1. The jobs of phase i are scheduled as follows: 
jobs are scheduled on machines 1, . . . at time z — 1, all the other jobs 

are scheduled from time i till time iV on machines m/2* -|- 1, . . . . The 

jobs of phase A: -I- 1 are scheduled on machine 1 starting time log 2 m (see Figure 
0. We conclude that since Co-pt = the competitive ratio is l7(logm). 

To extend the proof for randomized algorithms we use the same sequence, but 
hi is chosen uniformly at random among all jobs of phase i. Again, let Pi be the 
number of jobs that finish before bi in phase i. The time after the jobs of phase i 
become available and before the next phase can start is at least (Pi -I- 
Since Pi gets the values 0, . . . , {N + 2 — z)2^“® — 1 with equal probability, 

P(P,) = {{N + 2-i)2’^-^ -l)/2 . 



Hence, 



k 

Con > Y. E{P, + 1)2-'=+*-! +N- log 2 m 
k 



> ^(-/V -1- 2 — z)/4 = 0{Nlogm) . 

i=l 

Since Copt = TV we conclude that the competitive ratio is l7(logm). 



Theorem 3. The competitive ratio of Greedy is O(logm) for the restricted as- 
signment model with consistent precedence constraints. 

Proof. For machine i, let A{i) be the set of jobs j that i S M{j). Denote the 
optimal off-line value by A. We first prove the following Lemma: 

Lemma 1. The total idle time on a machine i, from the beginning till the last 
job in A{i) finishes its process (on any machine) is bounded by X. (Some of this 
idle time may be after the last job on i is already completed). 

Proof. For each machine z, we build a chain of jobs in which each job is dependent 
on the previous job, and each time z is idle, one of the jobs in the chain is running. 
Since the total running time of jobs in the chain is at most A (the optimal off-line 
algorithm can not run more than one job of the chain simultaneously), the total 
idle time of machine z would be also bounded by A. We build the chain from the 
top, starting from the last job in the chain. If there is no idle time on machine 
z, the chain is empty and the lemma follows. Otherwise, we start the chain with 
the job in A{i) that finishes last, denote it by J\. Assume that Ji, . . . , Jg_i are 
defined. If Jq-\ has no predecessors, we finish the chain. Otherwise, let Jq be 
the predecessor of Jq-i that finishes last. Note that since all the chain consists 
of predecessors of J\ and the precedence constraints are consistent, all the jobs 
in the chain are also in A(i). Assume that z is idle at time t, and no job in the 
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chain is running at time t. There is at least one job that finishes after time t 
(Ji for example). Since there is no job of the chain running at time t, all these 
jobs start running after time t. Let Jr be the first job of the chain that starts 
running after time t. All the predecessors of Jr finish before time t thus since i 
is idle at t, Jr could be scheduled at time t or before. This is a contradiction to 
the definition of Greedy. 

Note that it follows from Lemma E that the total idle time on a machine 
from the beginning till the last job that runs on this machine is completed is 
also bounded by A. 

Lemma 2. Let I > SX be some time during the process of the algorithm. If the 
total running time of jobs (or parts of jobs) that run after time I is Ti then the 
total running time of jobs that run after time I — 3X is at least 2Ti. 

Proof. Let ki = |"^]. The optimal off-line uses at least ki machines to run 
the jobs that the on-line runs after time 1. Since the maximum running time is 
bounded by A, these jobs start after time I — A. For each machine i among the 
ki machines, there is a job that is allowed to be scheduled on it and is scheduled 
after time I — A, thus machine i has at most A idle time from time I — 3A till time 
I — A. The total running time on i in this time period is at least A. Summing 
for all machines the total running time is at least fciA, and adding the running 
times after time I we get a total of kiX + Ti > Ti + Ti = 2Ti 

Now, we can complete the proof of the theorem. Let T be the total running 
time of all jobs, note that T < mX. Let k = \ Con! (3A)J . We can assume without 
loss of generality that Con > 3 A. Hence k>l. Note that the competitive ratio r 
satisfies r = 0{k). Let Tj be the total running time of jobs after time Con — 3jA. 
According to LemmaEl satisfies > 2^~^Ti and according to Lemmas Ti 
satisfies T\ > 2A, this is correct since there is at least one machine that finishes 
at time Con, and since the idle time on this machine is bounded by A, this 
machine worked at least for a period of time 2A after time Con — 3A. Combining 
all observations together we get mX > T > Tk > 2^~^T\ > 2 ■ 2^~^X. Thus 
k = O(logm), and also r = O(logm). 

4 Restricted Assignment with General Precedence 
Constraints 

In this section we consider the restricted assignment model with general prece- 
dence constraints between jobs. 

Theorem 4. Any on-line scheduling algorithm for restricted assignment model 
with general precedence constraints has the competitive ratio of at least m. This 
is true even for preemptive algorithms versus non-preemptive adversary. Any 
randomized algorithm for the same problem has the competitive ratio of Q(m). 
This is true even for randomized preemptive algorithms versus deterministic non- 
preemptive adversary. 
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Proof. We first prove a lower bound on deterministic algorithms, and later ex- 
tend it to randomized ones. We build the sequence according to the behavior 
of the on-line algorithm. Let N be an integer N > m, The optimal cost for 
the sequence would be N. The sequence contains m phases, in each phase, all 
jobs are restricted to a single machine. Phase 1 contains N unit jobs which are 
restricted to machine 1. Let bi be the job from phase 1 that finishes last. We 
define the other phases recursively: In phase i {i > 2), there are N — i + 1 unit 
jobs which depend on the job &i_i, and are restricted to machine i. We denote 
the job from phase i that finishes last by bi. 

The on-line does not schedule any job from phase * -I- 1 until all jobs of phase 
i are completed, because all jobs of phase i -I- 1 depend on bi, thus the on-line 
has at most one working machine at a time (each job is restricted to a single 
machine) and the minimum possible on-line makespan is simply the sum of all 
running times: Con > — i -I- 1) = m{N — rfij2 + 1/2) (see FigureEj). 

The optimal off-line algorithm assigns each bi at time i — 1, and all other 
jobs of phase i are scheduled starting from time i, hence Copt = ^ (see Figure 
Ejl. The competitive ratio is at least m — m? /{2N) + m/{2N) > m — mf /{2N), 
for large values of N, this number approaches m. 




Fig. 5. A possible on-line assignment in the proof of Theorem E] 
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Fig. 6. A possible optimal off-line assignment in the proof of Theorem 0] 



To extend the proof to randomized algorithms we use a similar sequence, 
which also has m phases, where phase i contains N — i+1 jobs that are restricted 
to machine i, but here the job bi for i = 1 is chosen uniformly at 

random among all jobs of phase i. Let Pi be the position of bi, which is the 
number of jobs from phase i that were completed before bi was completed. Pi 
can get the values 0, . . . , N — i, all with equal probabilities. For i > 2, the jobs of 
phase i are scheduled after at least Pi-i + 1 jobs were completed at phase i — 1 
and thus Con = + 1) + IV — m -f 1. Thus 

m— 1 

E{Con) ^ 'y ] (E{Pi) + l) + iV — m-|-l , 

i=l 

since E{Pi) = (N — i)/2 we get 

m— 1 

E{Con) > (m - l)(fV/2 +l)-'^i/2 + N-m+l 

1=1 

= mN/2 — N/2 -I- TO — 1 — to(to — l)/4-|-fV — to-|-1 
= mN/2 + N/2 - O(to^) . 



Since Copt = E, the competitive ratio is at least (to-|- l)/2 — 0(TO^/Ai), for large 
values of N, the lower bound approaches (to -|- 1)/2 = 0{m). 

Both lower bounds are valid even with preemptions since we only consider 
finishing times of jobs, and not starting times. 

Theorem 5. The competitive ratio of Greedy is m for the restricted assignment 
model with precedence constraints. 

Proof. If all machines become idle, then there are no new jobs and the sequence 
is completed. Thus if Con = T, there is at least one working machine during 
time T, the sum of all processing times is at least T, and Copt T/m, which 
gives the competitive ratio of to. 
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We can define greedy also for unrelated machines. The algorithm assign a 
job j to a machine i such that the running time of j on i is minimum over all i. 

Theorem 6. The competitive ratio of Greedy is m for the unrelated machines 
model with precedence constraints. 

Proof. Since the time that the optimal off-line uses to run each job is at least 
that of Greedy, we can imitate the proof of Theorem 0 
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Abstract. Distributed systems execute background or alternative jobs 
while waiting for data or requests to arrive from another processor. In 
those cases, the following shut-down scheduling problem arises: given a 
set of jobs of known processing time, schedule them on m machines so 
as to maximize the total weight of jobs completed before an initially nn- 
known deadline. We will present optimally competitive deterministic and 
randomized algorithms for shut-down scheduling. Our deterministic al- 
gorithm is parameterized by the number of machines m. Its competitive 
ratio increases as the nnmber of machines decreases, bnt it is optimal 
for any given choice of m. Such family of deterministic algorithm can be 
translated into a family of randomized algorithms that use progressively 
less randomization and that are optimal for the given amount of ran- 
domization. Hence, we establish a precise trade-off between amount of 
randomization and competitive ratios. We also give a probabilistic anal- 
ysis for the cases of uniform and exponential distributions. Finally, we 
report experimental results from trace-driven simulations. 



1 Introduction 

Internet traffic alternate lulls with spikes of extreme activity |[] ES]- As a result, 
Web performance is especially improved when operations are moved from peak 
periods to intervening lulls. For example, idle periods can be exploited by servers 
to speculatively disseminate data to dial-up clients, thus substantially reducing 
the latency experienced to retrieve subsequent documents 0. Delays can stall 
the execution of a distributed query in a Web-based database system so as to 
trigger alternate queries or query plans 

Shut-Down Scheduling. In those architectures, if a job is preempted, it will not 
be resumed and any partially completed work is lost. Consequently, the follow- 
ing core optimization problem arises: a set of alternative or background jobs can 
be scheduled during a lull. A lull has unknown duration because it ends asyn- 
chronously when a message is received from a remote host. We will refer to such 
problem as shut-down scheduling because jobs execution is unpredictably inter- 
rupted. The off-line version of shut-down scheduling is a maximum 0/1 multiple 
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knapsack problem where all knapsacks have the same capacity. A book summa- 
rizes results in the theoretical and practical solution of the multiple knapsack 
problem HHI: it is strongly NP-hard CHI. and a polynomial-time approximation 
scheme has been recently discovered m- Several authors have considered an 
on-line single knapsack problem where the deadline is known in advance, and 
jobs arrive on-line cni US). The on-line knapsack problem can be regarded as 
the dual of shut-down scheduling and it is substantially an admission control 
problem HEl. In general, shut-down scheduling is related to on-line call control 
din load balancing Ellin], and bin packing 000 Scheduling with machine 
breakdowns has been considered as well Piniini: jobs must be scheduled on 
m processors so as to complete in the presence of permanent or transient faults. 
Breakdowns differ from shut-down in several respects, as, for example, arrival 
times, objective function, job restart, redundant scheduling, and for the tech- 
niques and results of the analysis. 

Our Results. We will present optimally competitive deterministic and random- 
ized algorithms for shut-down scheduling. Randomized algorithms can be fully 
derandomized provided that there are sufficiently many machines. If there is only 
a small number m of machines, we will give an optimal deterministic algorithm 
CSM that is parameterized by m. The competitive ratio of CSM increases as m 
decreases, but, for any given choice of m, our CSM algorithm is optimal. We will 
also interpret CSM as a family of randomized algorithms that use progressively 
less randomization at the price of a worse competitive ratio. Our algorithm is 
optimal for any given choice of the amount of randomization and coincides with 
the optimal deterministic and randomized algorithms in the two extreme cases. 
Thus, such algorithm establishes a precise trade-off between randomization and 
competitive ratio. Randomized algorithms and lower bounds are transformed 
into deterministic algorithms and lower bounds by a technique that is simple 
and that might be more generally applicable to other scheduling problems. We 
report experimental results on Web trace simulations and indicate that a com- 
petitive algorithm indeed outperformed natural, but non-competitive strategies. 

Probabilistic Analysis. We will also conduct a probabilistic analysis of algorithms 
for shut-down scheduling on m = 1 machine. Probabilistic analyses of knapsack 
problems have been performed by several authors 0 0 Cl 0 El]. A proba- 
bilistic analysis was also performed for the on-line case [H|TE|- We will focus 
on shut-down scheduling and on the case when the deadline D is exponentially 
distributed. We will show a policy that maximizes the expected profit for the 
exponential distribution. We also present a shut-down schedule that breaks ties 
among jobs so as to minimize variance without worsening expected profit. There- 
fore, the resulting strategy is, in the parlance of portfolio theory, E, V efficient 

m 

Contents. The paper is organized as follows. In ^ we introduce our notation for 
shut-down scheduling. In H we present competitive analyses and give optimal 
deterministic and randomized algorithm for shut-down scheduling. In ^ we 
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conduct a probabilistic analysis. In we sketch some of the results of our 
simulations. 



2 Preliminaries 

In this section, we give definitions and notations for the shut-down schedul- 
ing problem. First, we introduce our notation for the m = 1 machine problem. 
The maximum 0/1 knapsack problem is: given lengths l(i) € IN (* £ [n] 
{1,2,..., n}), profits p(i) £ IN (f £ [n]), and a deadline D, find a subset J C [n] 
such that l(i) < D that maximizes '/2i^jP{i)- We will now introduce some 

notation. The profit p{J) and length 1{J) of a set J C [n] are defined in the obvi- 
ous way: p{J) = Y^i^jPii) and 1{J) = J2,^jl{i)- Let p*{D) = maxj,pj)<DP{J) 
be the optimum 0/1 knapsack objective value. Let tt = (7ri,7T2, . . . ,TTk) be a k- 
permutation of [n], define Ji = (tti, 7T2, . . . , TTi} {1 < i < k) and p{tt, D) = p{Js) 
where s is the largest integer such that l{Js) < D. Note that when k < n and 
D > l{Jk), the machine will remain idle after the completion of the k scheduled 
jobs. Although it does not seem intuitive, some algorithms will in fact exploit the 
fact that only some of the jobs are scheduled. Finally, we will omit the reference 
to D in p*{D) and p{tt,D) when the deadline D is clear from the context. 

We will consider a two-person zero-sum game which is based on the maximum 
0/1 knapsack problem and which we call the knapsack game. In the knapsack 
game, all the values l{i) and p{i) (i £ [n]) are known at the beginning, but the 
deadline D is not. The player G selects a permutation tt of [n] and the player 
H chooses D. If p*{D) > 0, the quantity v{tt,D) = p{tt , D) / p* (D) will be G’s 
payoff corresponding to the strategies tt and D. If p*{D) = 0, we define G’s 
payoff to be one. The objective of G is to maximize its payoff in the game. Since 
G’s payoff is always at most one, we can assume without loss of generality that 
D > minjg[„] l{i). Notice that G has at most nl strategies and H has at most 
2" strategies, so that, for a given n, the knapsack game is a finite matrix game. 
We interpret the knapsack game as an on-line problem as follows. We have a 
set of n jobs numbered from 1 to n. Each job has a profit p(i) and it takes l{i) 
units of time to be completed. The on-line algorithm G starts to schedule jobs 
on one machine according to some ordering tt. At time D, the adversary shuts 
the machine down, and G gains the values of all the jobs completed before D. A 
strict competitive ratio is an upper bound to the inverse of the game value. We 
do not allow additive terms in the competitive ratio because the game is finite. 
We remark the difference among the following quantities relative to an (on-line) 
algorithm: 

Profit Total profit of jobs completed before the deadline 
Payoff Ratio of the algorithm’s profit over the adversary’s. The payoff is relative 
to the chosen strategies for the on-line algorithm and for the adversary. 
Game Value Best payoff an on-line player can achieve. 

Competitive Ratio An upper bound on the inverse of the game value. 
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It can be noticed that the competitive ratio is defined in terms of inverse of 
game values, which is the correct choice in maximization on-line games j2). We 
observe that the knapsack game is trivial if all p{i)’s are equal (choose the jobs 
in non-decreasing length order) or if all /(*)’s are equal (choose the jobs in non- 
increasing profit order). In the more general scenario, we will assume that a job 
can be scheduled on any one of m machines that run at the same speed. The 
adversary will shut all machines down at the deadline D. 

Henceforth, we will use natural logarithms because they simplify notation and 
derivatives. Of course, the logarithm base does not alter the order of asymptotic 
bounds. Finally, we introduce some quantities that will be fundamental to the 
analysis below. Define V = maxig[„]{p(i)}/minig[„]{p(i)} as the ratio of the 
largest to the smallest profit. Another important quantity is L, the number of 
distinct length values in the job set, that is, L = |{^j) : j € [n]}| < n. Finally, 
we define the critical number of machines /i = min{L — 1, In I^}. 

3 Competitive Analysis 

In this section, we conduct competitiveness analysis for the shut-down scheduling 
problem. 

3.1 Randomized Algorithms 

We present strongly competitive randomized algorithms for shut-down schedul- 
ing. Here, we will focus on the case when L,V ^ 0(1), and we will obtain 
different competitive ratios depending on the relative growth rate of L and V. 
We begin with a lower bound on the case of m = 1 machine 

Lemma 1. No randomized algorithm for the knapsack game can be better than 
Q{L) -competitive when V = 17(2^) and better than flifogV) -competitive when 
V = o{2^). 

The proof will exploit the minimax principle |2|. 

Proof (Sketch). Let p = 1/ l{i) = n -\- i — 1 and p{i) = [p(l)p^“*J for 

all i G [n]. Notice that p(l) < p{2) < ... < p{n) < V, so that the ratio of 
the largest to the smallest value is indeed bounded by V. The two fundamental 
points of the proof are the following. First, if I? < 2n — 1, at most one job can 
be scheduled before the deadline. If the on-line player guesses the right job, its 
payoff is one. If it guesses a job that is longer than the deadline, its payoff is 
naught. Finally, if it guesses a job that is shorter than the deadline, its payoff 
is limited by the exponential growth of profits. The second point is to use the 
minimax principle as follows. Let c = n{l — p) -\- p. We can show a probability 
distribution over D that forces any deterministic on-line strategy to have payoff 
0(l/c). By the minimax principle, the value of the game is 0(l/c), and so the 
competitive ratio of a randomized on-line algorithm is 17(c) = I7(n(l — p)). An 
asymptotic analysis of c completes the proof. □ 
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We now show that the same lower bound holds for an arbitrary number m 
of machines. 

Corollary 2. No randomized algorithm for the shut-down scheduling on m ma- 
chines can he better than Q{L) -competitive when V = 17(2^) and better than 
fHfogV)-competitive when V = o(2^). 

Proof. The proof is a reduction to the case of m = 1 machine. Consider the same 
counterexample as in LemmaQon n' jobs and replicate each job for m times, so 
that the total number of jobs is now n = n'm. The number L of length classes 
remains unchanged. Again, at most one job can complete on any machine when 
D < 2n — 1. We will show how to convert any randomized algorithm for the 
m machine instance into a randomized algorithm for the original one-machine 
problem so that the two schedules achieve the same payoff, li D < 2n — 1, any 
randomized strategy for m machines is completely characterized by the expected 
number fi of machines starting a job of length n-\-i — 1. Let h be the index with 
D — 1(h). By linearity of expectation, the on-line expected profit is /iP(*)- 

Meanwhile, the adversary’s profit is mp(h), so that the on-line expected payoff is 
SiLi fiP{^) / {iTT'Pih)). Consider an on-line algorithm for the one machine instance 
that schedules job i with probability fi/m. Its expected payoff is exactly the same 
as the m machine algorithm for any choice of deadline D < 2n — 1. Hence, the 
same lower bound as in Lemma Q applies, and the proof is complete. □ 

If m > 1 and all l(i)’s are equal, then shut-down scheduling is trivial (schedule 
jobs in non-increasing profit order). We turn to the case when the p(i)’s are equal, 
and show an 0(l)-competitive algorithm. Such algorithm is an intermediate step 
to solve the case of general profits. First, notice that when all profits are equal, 
our objective is to maximize the number of completed jobs. Define the canonical 
job scheduling algorithm for a set C C [n] as a list scheduling algorithm nm that 
orders the jobs from the shortest to the longest. 

Lemma 3. Let C C [n] he a set of jobs. The canonical schedule of C completes 
at least 1/5 of the jobs completed by any other algorithm before the deadline D. 

Define the load of a machine as the total length of jobs completed on that 
machine before the deadline and the makespan as the maximum load of any one 
machine. The proof will exploit a result for load balancing of permanent jobs 

PH. 

Proof. The proof is organized as follows. We partition the jobs executed by the 
optimum into five classes, depending on their starting and completion time with 
respect to the deadline D and the makespan of the canonical schedule. Then, 
we show that no class contains more jobs than those completed by the canonical 
schedule before the deadline D. Hence, the canonical schedule completes at least 
1/5 of the jobs completed by the optimum, which will complete the proof. 

Assume without loss of generality that jobs are numbered in non-decreasing 
order of length, that is, l(j + 1) > l(j) for j = 1, 2, . . . , n — 1. Let G be the set 
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of jobs completed by the canonical schedule before D and let Mq the makespan 
of G, that is the time the last job in G is completed. By definition, Me < D. 
Observe that initially G starts by assigning one job in [m] to each machine. Let 
H he & the largest set of jobs completed before D. Suppose first that \H\ < m. 
Then, D is at least the length of the longest job in H, which is at least 1{\H\). 
Hence, G completes at least one job on at least \H\ machines and the claim is 
proven. Assume now \H\ > m, so that D > 1{\H\) > l{m). Hence, |G| > m. We 
now make the following definition: if X C [n] is a set of jobs, then is the 
minimum makespan to complete X. Observe that ii X C Y, then < My- 
Analogously, if X,Y C [n] and if there is a one-to-one mapping f : Y ^ X with 
Kj) ^ KfU)) fo'' j S ^-1 then My < My- Let Mq be the earliest time when 
G can be completed. A load balancing results claims that 2Mq > Mq. Schedule 
H on. m machine so that the schedule completes before time D and partition H 
into five subsets according to such schedule as follows (Figure Q] gives an example 
of such partition). 




1 2 m 



Fig. 1. A partition of the optimal set H of jobs according to the optimal makespan 
Mq and the actual makespan Mq of the on-line algorithm. 



— The subset Hi C H is the set of jobs that complete before time Mq. We 
claim that |iLi| < |G|. Suppose by contradiction |iLi| > |G|. Take any proper 
subset H' C Hi with \H'\ = |G| elements. Since G consists of the |G| shortest 
jobs, there is a one-to-one mapping f ■. H' ^ G with l{j) > l{f{j)) for all j G 
H' . Hence, the optimal makespan M'^, of H' is not smaller than the optimal 
makespan Mq of G. Therefore, Mq < M'^, < M^ and a contradiction is 
reached. Therefore, we conclude \Hi \<\G\. 

— The subset H 2 C H is the set of jobs that start before time Mq and complete 
after time MX. The set H 2 contains at most one job per machine, so that 
\H2\<m< |G|. 
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— The subset C H is the set of jobs that starts after time Mq and complete 
after time Mq. Since Me — Mq < Mq, we conclude that \H^\ < |G| with 
an argument similar to Hi. 

— The subset H4 C H is the set of jobs that starts before time Mq and 
complete after time Mq. The set H4 contains at most one job per machine 
and so \H4\ <m< |G|. 

— The subset C H is the set of jobs that start after Mq. Notice that 
D — Mq < ^(|G| -I- 1) or else G would have scheduled one more job. Hence, 
H5 can contain at most |G| documents because job |G| -I- 1 does not fit in 
the alloted time D — Mq < ^(|G| -I- 1). 

Notice that Hi, H 2 , ■ ■ ■ ,H^ give indeed a partition of H. We conclude that \H\ < 
5|G|, which proves the lemma. □ 

Corollary 4. The canonical schedule is 5- competitive for shut-down scheduling 
on any number m of machines when p{i) = 1 for all jobs i. 

The canonical schedule algorithm easily generalizes to the case of arbitrary 
profits by using the CRS techniques. Partition the job set into O(logP) profit 
classes such that no job is more than 0(1) times as profitable as any other job 
in the same class. Then, extract a profit class at random and execute jobs only 
from that class. However, if V = 17(2^), then jobs are partitioned according to 
their length in such a way that a job class contains only jobs of the same length. 
We conclude that 

Theorem 5. The best randomized algorithm for shut-down scheduling is 0{L)- 
competitive when V = and OifogV) -competitive when V = o{2^). 

A consequence of the matching upper and lower bounds is that if we change 
the number m of machines, we do not help nor hamper the competitive ratio of 
randomized algorithms. 

3.2 Deterministic Algorithms 

We now turn to deterministic algorithms. It is helpful during the discussion to 
refer to table Q which summarizes our results. First, we argue that if there is 





V = 0 ( 2 ^) 


V = G(2^) 


m < g 


0{mp) 


0{mp) 


m > g 


0{logV) 


0(L) 



Table 1. Competitive ratios of the best deterministic algorithm for the m machine 
knapsack game, where L is the number of length classes, V is the ratio of largest and 
smallest profit, g — min{L — 1, In F} is the critical number of machines, and p = VF. 



a sufficiently large number m > g of machines, then, we can find deterministic 
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algorithms that match the randomized lower bound. We derandomize the CRSlog 
algorithm as follows. If we have m>/j,+ l>lnR+l machines, we can assign 
m' = [77i/(ln F+1)J machines to process jobs in each profit class according to the 
canonical schedule. Roughly speaking, the derandomized version translates the 
probability of executing jobs in class Ci into the fraction of machines assigned 
to class Ci- It is critical that profit classes be disjoint sets, as otherwise a job 
would have to be scheduled on more than one machine. 

Lemma 6. The derandomized version of the CRSlog algorithm is O(logR)- 
competitive for shut-down scheduling on m >lnV 1 machines. 

Proof. At any time, the algorithm has completed at least 1 / 5 of the jobs in a 
certain class that are completed by any other algorithm that uses m' machines 
for that class. Hence, the adversary completes in each class at most hmjrrf = 
0(log V) jobs more than the derandomized CRSlog algorithm, and, on each job, 
it earns less than e times as much as the derandomized CRSlog. Thus, such 
algorithm is 0(log R)-competitive. □ 

We can analogously derandomize the 0(L)-competitive algorithm as long as 
we have m > /r + 1 > L machines. It remains to establish deterministic compet- 
itive ratios for m < /r machines. In this case. Corollary Elis tight for randomized 
algorithms, but gives a weak lower bound for deterministic algorithms. Intu- 
itively, the weakness of Corollary El stems from the fact that it is not always 
possible to execute simultaneously all deterministic strategies that compose a 
randomized algorithm if only few machines are available. Define p VC (such 
notation is independent of that in Lemma 0 and notice that p > 1. We will 
frequently use the equality In p = (In V) jm and m = In R/ In p. 

Lemma 7. If m < min{L — l,lnC}, then no deterministic algorithm can be 
better than 12 {mp) -competitive. 

Notice that mp = w(mlogp) = w(logR). On the other hand, if R = 17(2^), 
then p = C(2^/™) = uj{L/m), and so mp = uj{L). Hence, Lemma El dominates 
Corollary El when m < p. 



Proof (Sketch). The proof is based on an instance with the property that the 
adversary will be able to choose a bad deadline for any on-line algorithm. The 
instance consists of m-|- 1 classes of m identical jobs such that jobs in class i are 
p times more valuable than jobs in class i — 1. Job lengths are chosen in such a 
way that only one job can complete on any one machine, which is similar to the 
length distribution of Lemma [D Since there are more classes than machines, the 
on-line algorithm does not schedule any job from a certain class. The adversary 
chooses the deadline so that the optimum strategy schedules jobs only from that 
class, while the on-line algorithm achieves a small profit. We will now give some 
details of the arguments. 
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Set-up. The proof is based on an instance where there are m identical jobs of 
profit [pod*J (* = 0,1,2 , to) for some minimal profit Pq. Hence, the total 
number of jobs is n = to(to -I- 1). Notice that L = to -I- 1 > to. Observe that the 
minimum profit is po and the maximum profit is no more than poV. A profit class 
is a set of jobs with the same profit. We will think of profit classes as ordered 
by the profit of the jobs they contain. A job of profit poP* has length 2m i. If 
D < 3to, then at most one job can complete on any one machine. 

Holes. Since the number of profit classes is m-\- 1, there is at least one profit class 
from which no job is completed before the deadline D < 3to. We will say that 
a hole is a maximal non-empty sequence of profit classes with the property that 
no job has been scheduled from any class in the hole. Clearly, there is at least 
one hole. If the first class Cq is in a hole, then the adversary will set D = 2to, 
and the on-line algorithm achieves no profit. Therefore, we can assume from now 
that the first class is not in a hole without loss of generality. Suppose that the 
holes are Hi < H 2 ^ . . . ^ Hi. Let ki be the number of jobs scheduled from the 
class immediately preceding hole Hi . We claim that there is at least one hole Hi 
for which H < \Hi \ 1. Suppose that this is not true. Then, denote by ly the 

number of classes that are not immediately followed by a hole and observe that 
the total number of jobs is at least v -\-2l = m-\-l > to, which is a 

contradiction. The adversary chooses a deadline equalt to the maximum length 
of a job in a hole H such that |"H| -I- 1 is a bound on the number of jobs scheduled 
in the class immediately before the hole. 

An analysis of payoffs concludes the proof. □ 

Such lower bound is matched by the following CSM (Canonical Schedule for 
TO machines) algorithm. First, normalize job profits so that the minimum profit 
is one and the maximum profit is V . CSM divides the job set into to classes 
according to job profit, where class i consists of jobs of profit < p{j) < p*. 
CSM schedules jobs of class i on machine i according to the canonical schedule. 

Lemma 8. The CSM algorithm is 0{mp) -competitive for the knapsack game on 
TO of machines. 

Proof. Define the hth profit class as Ch = {j ■ < p{j) < p^}. Let Xh be the 

number of jobs in class h scheduled by the optimum before the deadline D. Hence, 
the optimum profit from class h is less than XhP^. The CSM algorithm schedules 
at least a Xh/{5m) jobs in class h, so that its profit is at least XhP^~^ /{5m). 
Hence, CSM has a payoff of at least 

h=l ^hP ^ ^hP ^ 1 (1 ^ ^ 

mY/h=iXhP^ ~ mY///=i,a:,,iio^hP^ - 5m p- 5mp ' 

The competitive ratio is the inverse of the payoff, and thus the proposition is 
proven. □ 

The lower bound and the CSM algorithm are summarized by: 

Theorem 9. The best deterministic algorithm for the knapsack game on m < 
min{L — l,lnC-|- 1} machines is 0{m y/V) -competitive. 
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3.3 Reduced Randomization 

The CSM algorithm can be translated into a randomized algorithm, where a 
random class of jobs is scheduled according to the canonical schedule. We will 
name the resulting randomized algorithm CSMr. 

Lemma 10. The CSMr algorithm is 0{mp)- competitive for the shut-down 
scheduling problem. 

Proof. Since there are m profit classes, jobs in the same classes have profits that 
are within a factor of p, and the canonical schedules has a performance guarantee 
of 5, we obtain that the payoff of CSMr is at least l/(5mp). □ 

Hence, CSMr gives a precise trade-off between randomization and compet- 
itive ratio. Indeed, if no randomization is allowed, then the best algorithm is 
0(M)-competitive. As the amount of randomization increases to m strategies 
(m < InM), performance improves as 0{m a/M). Finally, when m = InV 1, 
the best algorithm is 0(log C)-competitive and, \i V = o(2^), no further im- 
provement stems from adding more machines. Meanwhile, we claim that CSMr 
achieves optimal performance. 

Theorem 11. The best randomized algorithm that is a distribution over only m 
deterministic strategies is 0{mp)~ competitive. 

Proof (Sketch). Consider the proof of Lemma 0 and replace the number of ma- 
chine starting a certain job class with the expected number of machines. □ 

4 Probabilistic Analysis 

In this section, we will conduct probabilistic analyses of shut-down scheduling 
on m = 1 machine. Let tt = (tti, 7T2, . . . , 7r„) be a permutation of [n] and Ji = 
{7ri,7T2, . . . ,7rJ. Then, 

n n 

E[p{t^)] = '^p{J()Pr[l{Ji) < D < l{J^+l)] = '^p{TT,)Pr[D > 1{J,)] . (1) 

Our objective is to find a permutation tt that maximizes Q. A corresponding 
decision problem is to find a permutation tt such that ^[^(Tr)] < p for some 
given p. Such decision problem is easily seen to be NP-complete as it reduces 
to a knapsack problem when there is a t with Pr[D = t] = 1. First, we give a 
general optimality criterion. 

Lemma 12. If a permutation tt maximizes m, then, for all i € [n — 1], 
p{TT,)Pr[l{Ji) <D< l{Ji+i)] > p{ni+i)Pr[l{Ji^i) -b l{ni+i) <D< l{Ji+i)] . 
Moreover, if tt maximizes m and 

p{TTt)Pr[l{Ji) < D < l{Ji+l)] = p{TTi+l)Pr[l{Ji-l) -b l{TTi+l) < D < l{Ji+l)] , 
then the permutation tt' obtained by exchanging and is also optimal. 
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Proof (Sketch). If this were not so, exchange jobs and tti+i to increase the 
profit. Analogously, if equality holds, the profit remains unchanged, and thus 
optimal. □ 

A simple corollary is that if all p(*)’s are equal, then the optimal solution is 
to arrange jobs in increasing order of length, independently of the distribution 
of D. Another simple consequence is that if D is uniformly distributed in an 
interval [0, A] of the real line with A > /([n]), then he optimal permutation is to 
arrange jobs in non-increasing profit density p{i)/l(i) (PD-order). The result for 
the uniform distribution also follows by noticing that, under such distribution, 
the objective (P) defines a weighted completion time problem, and we can apply 
Smith’s rule E0|. 

We now turn to the case when the deadline D is extracted according to an 
exponential distribution with rate A. The exponential distribution models the 
case when client requests arrive according to a Poisson process, and each request 
terminates a lull. First, recall that, if D is exponentially distributed, we have 
Pr[D > t] = Then, expression (P) becomes 

n 

E[p{Tr)] = . ( 2 ) 

i^l 

Define the exponential density of job i as the ratio de{i) = — 1). 

The Exponential Profit Density (EPD) algorithm arranges jobs in non-increasing 
order of exponential profit density. We will say that a permutation is in EPD- 
order if its jobs are in non-increasing order of exponential profit density. 

Theorem 13. If Pr[D > t] = then a permutation maximizes ^ if and 

only if it is in EPD-order. 

Proof. If a permutation is optimal, then the optimality condition of Lemma 
ca implies that it is in EPD-order. Conversely, assume that the identity is an 
optimal EPD permutation, and suppose we exchange two terms h and h+1 with 
the same exponential value density. Ijemma. ffSl imnlies that the new permutation 
is optimal as well. Any permutation in EPD-order can be obtained by a finite 
exchange of jobs with the same exponential profit density, and the proposition 
is proven. □ 

The previous theorem suggests that in some sense lengths are exponentially 
more important than values for an exponential distribution. On the other hand, 
an exponential distribution can be approximated by a uniform distribution when 
A is large (A > Z([n])), in which case we can show that PD is within 1.1312 of 
the optimum. 

We observe that there are in general several scheduling strategies in PD-order 
(EPD-order). Although any such strategy maximizes the expected profit, we will 
show that some optimal strategies have smaller variance than others. Variance 
analysis is based on the optimality conditions and on the following 
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Lemma 14. Ifn is an optimal permutation that has minimum V ar[p{'K)] among 
all optimal permutations and p{TTi)Pr[l{Ji) < D < Z(Jj_|_i)] = p{Tri^i)Pr[l(Ji_i) 
+l{T^i+i) < D < l{Ji+i)] 0, then p{ni) < p{TTi+i). 



Proof (Sketch). Since tt is optimal, Lemma El holds. Hence, we can only ex- 
change jobs for which the equality condition holds. Furthermore, p{tt) is a con- 
stant among all optimal permutation, so that minimizing the variance is tan- 
tamount to maximizing the second moment E[p‘^(tt)]. Therefore, we seek opti- 
mality conditions for the problem where p{j) is replaced by —p^{j), subject to 
the constraints given by Lemma 1121 Such optimality conditions are found by an 
exchange argument and the lemma is proven. □ 

It can be seen that a tie breaking procedure for the uniform and exponential 
distribution is to favor shorter jobs. Indeed, suppose that job i and i + 1 have 
the same (exponential) profit density and p(i) < p{i + 1). Then, l(i) < l(i -|- 1). 
Hence, the optimal strategy that minimizes risk is to arrange jobs in PD-order 
(EPD-order) and break ties by scheduling shortest jobs first. 



5 Simulations 

In this section, we sketch the set-up and results of our simulations. We postpone a 
complete account to the full paper. We simulated speculative data dissemination 
during server idle periods. Simulation is based on four Web server traces. The 
base server bandwidth value depends on the server load and is 8KB/s for cs.edu, 
16KB/s for epa-http and 64KB/s for NASA. We assume that each client has an 
extension cache [B| of moderate size to keep both requested and disseminated 
documents. The major performance measure is the average delay. The delay is 
the time for a client to receive the requested document, and it includes a fixed 
network latency, the transmission time (document size over client bandwidth), 
and the time spent in the server queue. We simulated five strategies: 

1. Traditional http with no client cache, which is our baseline strategy. 

2. The http protocol with extension ^ client caches of 128 KB. 

3. Profit density (PD). 

4. PD(rho), which is similar to PD but takes into account the packet drop 
probability in estimating document profits, and 

5. CSM for TO = 1 machine, which sends documents from the shortest to the 
longest. 

Figure El compares delays normalized to the baseline http value. The CSM 
algorithm outperformed PD in all traces, except cs.edu, where the two algo- 
rithms have nearly equal performance. The PD algorithm performed better if 
it uses packet drop probabilities (PD(rho)), but even so, it was almost always 
outperformed by CSM. Both CSM and PD consistently improved over the case 
when no speculative dissemination is executed. 
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■□"CSM 

■ VD(rho) 

□ VD 

■ http-t-cache 

□ http 



epa-http nasa (Aug 4) nasa (Aug 14) 



Fig. 2. Relative delays of traditional http with and without caches, PD with no knowl- 
edge of the packet drop probability (PD), PD with perfect knowledge of the packet 
drop probability (PD(rho)), and CSM (m = 1). Sum of individual document latencies 
is normalized to traditional http with no client cache. Results for four Web server traces 
are reported. 
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Abstract. We consider load balancing in the following setting. The on- 
line algorithm is allowed to use n machines, whereas the optimal off-line 
algorithm is limited to m machines, for some fixed m < n. We show that 
while the greedy algorithm has a competitive ratio which decays linearly 
in the inverse of n/m, the best on-line algorithm has a ratio which decays 
exponentially in n/m. Specifically, we give an algorithm with competitive 
ratio of 1 -f and a lower bound of 1 -I- on the 

competitive ratio of any randomized algorithm. 

We also consider the preemptive case. We show an on-line algorithm with 
a competitive ratio of 1 -f 1 / We show that the algorithm is 
optimal by proving a matching lower bound. 

We also consider the non-preemptive model with temporary tasks. We 
prove that for n = m -f 1, the greedy algorithm is optimal. (It is not 
optimal for permanent tasks). 

1 Introduction 

Competitive analysis has been criticized for being too pessimistic. This worst 
case analysis sometimes fails to differentiate between algorithms whose perfor- 
mance is observed empirically to be very different. A general method to circum- 
vent these shortcomings was introduced by Kalyanasundaram and Pruhs 
d: resource augmentation. For certain scheduling problems with unbounded 
competitive ratio, they show that it is possible to attain a good competitive ratio 
if the machines of the on-line algorithm are slightly faster than the machines of 
the off-line algorithm. 

Resource augmentation has been applied to a number of problems. It was al- 
ready used in the paper where the competitive ratio was introduced m : here the 
performance of some paging algorithms was studied, where the on-line algorithm 
has more memory than the optimal off-line one. 

In several machine scheduling and load balancing problems 
the effect of adding more or faster machines has been studied. 
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We consider the following load balancing problem. Jobs arrive on-line, where 
job j has a certain weight Wj. The job has to be assigned immediately to a 
machine, adding Wj to the machine’s load. The on-line algorithm has n identical 
machines, and it is compared to an optimal offline algorithm which has m < n 
identical machines. 

For a job sequence cr we write A„((t) for the maximum load of A on n 
machines when it is given this job sequence. Analogously, we write OPT^^a). We 
denote the competitive ratio of an online algorithm A with n machines relative 
to an optimal offline algorithm with m machines by Cm,n{A). Specifically, 

f A\ An(o') 

Cm,n{A) — max . 

O' (Jr 1 Yfi (f7 j 

The classical case oin = m was considered in a series of papers 
The best upper bound is 1.923 due to Albers P and the best lower bound is 
1.853 P3 based on p. The case n > m was introduced by Brehob et al jS|. 
They showed that no matter how many machines the on-line algorithm has, it 
can never perform optimally: Cm,n{A) > 1 for all n > m > 2. However, one may 
expect that for reasonable algorithms would approach 1 when t = njm 

increases. In fact, P showed that the greedy algorithm has a competitive ratio 
which approaches 1 in a rate depending linearly on 1/t. 

In contrast, while the greedy algorithm has a competitive ratio which ap- 
proaches 1 in a rate depending linearly on 1/t, we design a non-greedy algo- 
rithm whose competitive ratio approaches 1 in a rate depending exponentially 
on t. More specifically, we give an algorithm of competitive ratio 1 -V 2 t(i-o(i)) ■ 
Moreover, we show that the competitive ratio of any on-line algorithm cannot 
decrease faster than exponentially in t by proving a lower bound of 1 H — t(i+o(i)) 
on the competitive ratio of any on-line algorithm. We also show for n = 2m a 
lower bound of 5/4. 

We also consider the preemptive case. Here we view load as time. Each job 
may be assigned to one or more machines and time slots, where the time slots 
have to be disjoint. The assignment has to be determined completely at the 
arrival of a job. Using similar techniques as in mm we prove a lower bound 
of 1/(1 — = 1 + gt(i+o(i)) on the competitive ratio of any randomized 

preemptive algorithm. We also show a matching upper bound by adapting the 
optimal preemptive algorithm of 0 to our problem. 

We can also view time as a separate axis and not as the load axis. Here jobs 
arrive and depart at arbitrary times and the cost of an algorithm is the maximum 
load over time and machines. This model is called the temporary tasks model 
(the case where jobs only arrive is called the permanent tasks model). It was 
proved in [5| that for n = m the greedy algorithm, which is 2— 1/m competitive, 
is optimal for this model. We show that if n is just slightly larger than m, i.e., 
n = m+l, then greedy which is 2 — 2/(m + 1) competitive is also optimal. Note 
that the results in pp| implies that the greedy algorithm is not optimal in general 
for permanent tasks also for n > m. 



Resource Augmentation in Load Balancing 191 



2 Permanent Tasks 

In this section we check the growth of the competitive ratio as a function of 
t = n/m. We start with the competitive ratio of the greedy algorithm. This 
algorithm was first given by Graham (m- and assigns each new job to the least 
loaded machine. The following lemma is shown in 0 using a similar analysis as 

in [TT] : 

Lemma 1. The competitive ratio of the greedy algorithm is 1 + . 

The above theorem implies a competitive ratio which is a linear function in 1/t. 
Surprisingly, we can give an algorithm called Buckets which has a competitive 
ratio l + l/2‘(i-°(i)). 



2.1 Algorithm Buckets 

For describing the algorithm Buckets we assume that t > 3. (If t < 3 we use the 
greedy algorithm.) Let 0 < e < 1 some parameter to be fixed later. We partition 
all machines into buckets: k = [t — small buckets, each of which contains m 
machines, and one big bucket that contains all other machines. Note that the 
big bucket contains at least ^ machines. 

Algorithm Buckets maintains a value A. Denote by Xi the value of A after 
the arrival of i jobs and by OPTi the optimal load after i jobs. The algorithm 
consists of phases. During a phase j, the algorithm can use only the big bucket 
and the small bucket number j mod k. We assign the first job to the first small 
bucket and initialize Ai = rci . We modify A only when a new phase starts while 
keeping the following two invariants on A: 

— max j<i Wj < Xi 

- (2 - e)OPT, > A, 

On arrival of a job i (starting from * = 2), we do the following: If Wi < Xi-\j2 
assign i greedily to the least loaded machine in the big bucket. If Ai_i/2 < Wi < 
Xi-i, and there is a machine in the small bucket which was not used in the 
current phase, assign i to this machine. Finally, if all m machines in the current 
small bucket were used in the current phase, or if rci > Ai_i, then a new phase 
begins: we define Xi = max((2 — s)Ai_i, Wi) and the job is assigned to a machine 
in the next small bucket. 

Theorem 1. The algorithm Buckets zs 1 + 2 t(i-o(i)) competitive for an appro- 
priate choice of £. 

Proof. We start by showing that both invariants hold after the arrival of a job 
(and thus hold throughout the execution of Buckets). After the assignment of 
the first job, Ai = OPTi = wi, and both invariants hold since e < 1. 

The first invariant always holds, since when a job which is larger than A 
arrives, A is modified. To show that the second invariant holds, we show that A 
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is increased only when the previous A is smaller than the current OPT, and that 
A is not increased too much. If A is increased since Ai_i < Wi, then OPT^ > Wi 
and since \i = max((2 — e)Xi-i,Wi) then Ai < (2 — e)wi < (2 — e)OPTi. If A 
is increased since all the machines in the small bucket were used in the current 
phase, then there are at least m + 1 jobs of weight more than and hence the 
optimal schedule has to assign two of them on one machine, yielding OPTi > 
Ai_i- Thus Ai < (2 — e)OPTi. 

Next we show that the maximum load in the big bucket never exceeds OPTi 
at step i (after arrival of job i). It is easy to see that the maximum load of 
running greedy on am machines is at most + maxj<i Wj. Since Wj < 
and Ai_i/(2 — e) < OPTi_i, the load is bounded by + ^^)OPTi_i < (| + 
T^)OPTi = OPTi. 

Last, we bound the maximum load on the small bucket machines. When a 
new phase starts, the value of A is multiplied by at least 2 — e. Each machine in 
a small bucket is used at most once in each phase. 

Consider a job which is assigned to a small bucket machine in the last time 
it is used. Denote this job by i' , and let A' = Ai'. Then the previous job assigned 
to the same machine is of weight at most A'/(2 — e)^. Moreover, a job that 
was assigned r > 1 jobs before i' to the same machine is of weight at most 
A'/(2 — eY^ . Thus the total weight of all jobs on this machine, except i' , is at 
most 2A'/(2 — eY- Since OPT > total weight of jobs on 

this machine is at most 

40 PT 4 4 

-(/) + < (1 + JY^.)OPT < (1 + ^^^^^)OPT. 

Choosing an appropriate value of e would give the required competitive ratio 
(for example e = \/YJt is a suitable value). □ 



2.2 Lower Bounds 

We begin by giving a simple exponential lower bound: 

Theorem 2. The competitive ratio of any deterministic on line algorithm is at 
least l + l/22‘-b 

Proof. We give a proof for even m and for integer t. It is easy to extend the proof 
for all cases. The sequence consists of n + ^ jobs that arrive in 2t + 1 phases. 
Phase 1 consists of ^ unit jobs, and phase t for t > 1 consists of ^ jobs of weight 
2*“^. The sequence stops after a phase in which the on-line schedules two jobs 
on one machine. (If the algorithm reaches the last phase, there are more jobs 
than on-line machines, therefore the on-line has two jobs on one machine). The 
optimal off-line load after every phase is the weight of the last job. If the on-line 
has two jobs on one machine, its load it at least 1 + x where x is the weight of 
the last job. The minimum value of would be 1 -I- where i = 2t -|- 1, 
hence 1 -I- 1/2^*“^ is a lower bound on the competitive ratio. □ 
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We can give a slightly better lower bound, this bound holds for deterministic 
and randomized algorithms. In fact, we show a lower bound on preemptive algo- 
rithms versus a non-preemptive optimal algorithm. Hence our lower bound holds 
both for the preemptive and non-preemptive models. The lower bound builds on 
the lower bounds given by Sgall m and independently by Chen, van Vliet 
and WOEGINGER pm. 

The main idea here is to use small jobs and a sequence of n big jobs Ji for 
1 < j < n of increasing weight so that the optimal off-line load after job Ji, which 
we denote by OPTi, is exactly equal to the weight of Ji. Hence, the weight of 
each big job is equal to the total weight of all previous jobs divided by m — 1. 
Specihcally, the sequence begins by very small jobs of total weight m — 1 followed 
by the sequence of the n big jobs. The weight of Ji for l<i<nis/r*“^ where 
a = 

m-i ■ 

Lemma 2. The optimal off-line load for the above sequence is after the 

arrival of the job Jk, for 1 < k < n. 



Proof. We consider an algorithm which assigns all jobs on off-line machines, and 
show that the resulting load is 

The algorithm assigns jobs to the off-line machines greedily, in non-increasing 
order (sorted according to weight). This is equivalent to using the LPT rule. We 
show that no big job is assigned in a way that some load exceeds Note 

that the total weight of all small jobs and first j big jobs is — 1) = 

Assume that the assignment of job j causes the maximum load to exceed 
. This means that all other machines are loaded by more than 
Since the total weight of jobs smaller or equal to Jj is we get that the 

total weight of jobs is more than pL^~^m which is a contradiction. Hence, the 
assignment of the small job results in balanced machines, each with load of 
□ 

The following lemma, adapted from pinj . is the key of lower bounding the 
competitive ratio. 

Lemma 3. For any deterministic or randomized, preemptive or non preemptive 
algorithms for the sequence above the following holds: r > ^Qprp. , where r is 
the competitive ratio and W is the total weight of the jobs. 



Proof. Denote by A{Ji) the maximum load of the on-line algorithm A after the 
assignment of the job Ji. Then 

E7=iE{AW) ^ Etir-OPff 
YJ7=iOPff - E7=iOPff 

Hence it is enough to show that E{A{Ji)) > W. 

Assume that A is deterministic. For 1 < / < n let T; be the load on the Z’th 
machine at the end of the sequence after sorting the machines by non-increasing 
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load. Removing any I — 1 jobs still leaves a machine with load at least Ti and 
thus A{Ji) > Tn-i+i- Since W = we conclude that 

n n 

Y,A{Ji)>Y,Tr,-l + l=W 

1=1 i=l 

as needed. If A is randomized, we average over the deterministic algorithms and 
conclude that 

n 

Y^E{A{Ji))>W . 

i=l 

□ 

Theorem 3. The competitive ratio of an on-line algorithm, deterministic or 
randomized, preemptive or non-preemptive, is at least 1/(1 — = 1 + 

Proof. We use the above job sequence and apply Lemma 0 We have 

W = fjA{m — 1) , 



i=l i=l 



/r" - 1 

fi-1 



and 

^/r"(m-l) M” _ 1 _ 1 

^ 1-^ 

as needed. □ 

We can improve the bound for the special case t = 2 for the non-preemptive 
deterministic case. 



Claim. The competitive ratio of any on-line algorithm for n = 2m, where m > 8, 
is at least |. 

Proof. We use a job sequence consisting of four phases: 

- m jobs of weight 1 

- jobs of weight 3/2 

- [yj -I- 1 jobs of weight 3 

- -I- 1 jobs of weight 4. 

The sequence stops after a phase in which the on-line schedules two jobs on one 
machine. Note that the sequence contains more than 2m jobs. 



m mod 6 


0 


1 


2 


3 


4 


5 


Amount of jobs 


2m -I- 2 


2m -I- 1 


2m -I- 1 


2m -I- 1 


2m -I- 1 


2m -t 1 
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We show that the optimal load in phase i is i. This is clear for phases 1 and 
2. In phase 3, if the machines are packed to a maximum load of 3, at most 2.5 
of space can be lost: 2 if a job of weight 1 has to go on its own machine, and 
0.5 if there is an odd number of jobs of weight 1.5. The total weight is at most 
m + ^ + (m + 3) = + 3, which is at most 3m — 2.5 for m >22. This implies 

that the machines can be packed with a maximum load of 3 for m > 22. By 
inspection, the machines can be packed for 8 < m < 21 too. 

In phase 4, the total weight is at most h!p _|_ 3 _|_ 4m _|_ w optimal 

packing, at most 3.5 of space is lost. We have f|m + ^ < 4m — 3.5 which holds 
for m > 20. Therefore the optimal algorithm can maintain a load of 4 in phase 
4, if m > 20. By inspection, it works for 8 < m < 19 as well. 

As an example, we give the optimal schedules for phases 3 and 4 when m = 8 
and m = 9 (see Figure 



m=8 



m=9 







Fig. 1. The last phases for m = 8,9 



Depending on the phase in which the on-line algorithm puts two jobs on the 
same machine, we find competitive ratios of 2, |, | and |. Hence the competitive 
ratio is at least 5/4. □ 

2.3 An Optimal Preemptive Algorithm 

The last part of this section presents an optimal preemptive on-line algorithm. 
The algorithm is similar to the algorithm in |3. 

Let r = 1/(1— ^). We denote the load on machine i at time T by Lj . The 
algorithm maintains three invariants, which hold at any step T: 

- L^<Ll<...<Ll. 
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- Ll <r-OPT^. 

— For 1 < fc < n, 



k -I 

^ “ n" - 1 

i=l 



where is the total weight of jobs which arrived till time T. 



Similarly to the algorithm in [Z| , we try to maintain a ratio of between ma- 
chine loads. We show how to assign a new job j with weight Wj, arriving at time 
T-l-1, to n machines. First the new optimal load is computed by max(W^“'"^/TO, 
maxi<i<T+i Wi ) H3, and then the following intervals are reserved for j: for 
1 < Z < n — 1, we reserve [Lf , and for I = n, reserve r ■ OPT^]. Note 

that these intervals are disjoint. Next, for j = n down to 1, assign a portion out 
of Wj of size equal to the size of the reserved interval. We do that until we run 
out of Wj. (The last portion assigned might be smaller than the interval.) 

It is easy to follow the proof in 0, replacing the number of machines used by 
the on-line algorithm from m to n. The proof shows that each job is completely 
distributed to the machines and that the invariants hold. By that we conclude 
that the algorithm is r-competitive as required. 



3 Temporary Tasks 

Recall that for n = m the greedy algorithm is (2 — l/mj-competitive for perma- 
nent tasks as well as for temporary tasks. Greedy is not optimal for permanent 
tasks, but is optimal for temporary tasks. Also for n > m, it is easy to see that 
greedy has the same competitive ratio for temporary tasks as for permanent 
tasks, which is 1 -I- (m — l)/n. However, in contrast to the case n = m, greedy 
is not optimal for temporary tasks, since algorithm Buckets (defined on tempo- 
rary tasks) achieves a better competitive ratio for large n. Specifically, it is easy 
to see that the same analysis of the competitive ratio of algorithm Buckets for 
permanent tasks also holds for temporary tasks. However, we show that if the 
online algorithm has one more machine than the optimal offline algorithm then 
the greedy is still optimal. 

Theorem 4. Greedy is optimal for temporary tasks for n = m + 1. 

Proof. We need to show a lower bound of on the competitive ratio of any 
on-line algorithm. The proof consists of two parts: one for odd m and one for 
even m. In the proof we mention the value of the optimal load only when the 
value increases. 



Case A. m is odd. We start the sequence by (m — l)m^ unit-weight jobs. The 
optimal load is m{m — 1). We distinguish between two cases: 
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Case Al. The online algorithm places at least m(m — 1) jobs on one machine, 
say machine x. 

In this case, all the jobs leave except m{m — 1) jobs on x. Then, m{m — 1) 
jobs of weight m — 1 arrive. Since the optimal load is again m{m — l),at most 
TO — 2 of them can go on x. Otherwise the load would be (2 to — 1)(to — 1) on x, 
and (2 to — 1)/to > 2m,j(jn +1). So (to — 1)^ + 1 of these jobs must go on the to 
empty machines. We distinguish between two sub-cases: 

Case Ala. One machine (not x) has at least to jobs of weight to — 1. 

All jobs of weight to — 1 leave except to job of weight to — 1 on one machine, 
and TO— 1 jobs of weight to(to— 1) arrive. The new optimal load is (to-|-1)(to— 1). 
Therefore all these jobs must go on different machines. Finally, a job of weight 
m{m + 1) arrives. This completes the proof since the online load is 2m? , while 
the optimal load is to(to-I-I): the last job has it own machine, the other machines 
have one job of weight to(to — 1), one or two jobs of weight to — 1 and some jobs 
of weight 1, so that the load is precisely to(to -I- 1). 

Case Alb. All machines (except machine x) have at least one job of weight 
TO — 1. 

All jobs of weight to — 1 leave except to jobs, one such job is on each machine 
except machine x. Next, ™ ~ 2 ™~^ jobs of weight 2 (to — 1) arrive. The optimal 
load is again to (to — 1). At most are assigned to machine x, otherwise the 
load there is too large. There are + ^ jobs on average on the other machines, 
so there is at least one machine (not x) with at least jobs of this weight 
and a load of at least to(to — 1), say machine y. All jobs leave except the unit 
jobs on X and jobs of total weight precisely to(to — 1) on machine y. 

Finally, to — 1 jobs of weight to(to— 1) arrive and one job of weight to(to-|- 1). 
Clearly, the online algorithm must assign each job of weight to(to— 1) to an empty 
machine and hence its final load is 2m? . The optimal algorithm can balance its 
jobs to a load of to(to -I- 1) since there are at least 2 (to — 1) jobs of weight 1, 
which completes the proof. 

Case A2. All machines now have load at least to — 1. 

All jobs leave except to — 1 jobs on each machine, and vr? — to — 1 jobs of 
weight TO — 1 arrive. The average number of jobs of weight to — 1 on the machines 
is TO — 2 -|- , and hence there is a machine with to — 1 jobs of weight to — 1 

and a load of to(to — 1). The loads are now the same as in Case Alb just before 
the arrival of the jobs of weight 2{m— 1). Hence, we can continue as in that case. 

Case B. m is even. We start the sequence by (to — 1)to^ unit jobs. The optimal 
load is to(to — 1). We distinguish between two cases: 

Case Bl. One machine, say x, has at least to(to — 1) jobs. All jobs leave except 
to(to — 1) jobs on X, and (to — 1)^ jobs of weight to arrive. The optimal load is 
again to(to — 1). We distinguish between two sub-cases: 
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Case Bla. Another machine (not x) has load at least m{m — 1). Then all jobs 
of weight m leave except m — 1 jobs on one machine, and m — 1 jobs of weight 
m{m — 1) arrive followed by a job of weight m{m + 1). Clearly, the online load 
is 2m^, while the optimal load is m{m + 1) which completes the proof. 

Case Bib. Each machine except x has one job of weight m. All jobs of weight 
m leave except m jobs, one on each machine except on machine x. Next ™ 
jobs of weight 2m arrive. At most can go on machine x. Hence, the average 
number of jobs of weight 2m on machines different than a; is ^ ~ 2 + ^. Thus, 
one machine must have ^ — 1 jobs of weight 2m and a load of at least m{m— 1). 
All jobs leave except the unit jobs on x and jobs of total weight m{m — 1) on 
the other machine. Finally, m — 1 jobs of weight m(m — 1) arrive and one job 
of weight m{m + 1). Clearly, the online load is 2m^, while the optimal load is 
m(m + 1) which completes the proof. 

Case B2. There are at least m jobs on each machine. 

All jobs leave except m jobs on each machine. Next, ™ (m- 2 )-m 
weight 2 arrive. If there is a machine with load at least m{m — 1), we continue 
as in Case Bl. Otherwise, each machine has load at least 2m. Then, some jobs 
of weight 2 leave in such a way that the load on each machine is 2(m — 1). Next, 

— 2m — 2 jobs of weight m — 1 arrive. Then, one machine will have a load of 
at least m{m— 1). Jobs of weight m — 1 on that machine leave such that the load 
becomes m{m — 1). All non-unit jobs on the other machines leave. We continue 
as in Case Bib. □ 

4 Conclusions 

We have examined the effects of resource augmentation for several load balancing 
problems. For the problem of scheduling jobs on identical machine, we have 
shown an algorithm with a competitive ratio which decreases exponentially in 
n/m, while greedy has a competitive ratio that is linear in n/m. 

An open question is whether it is possible to close the gap between the lower 
bound and the upper bound on identical machines. Both bounds are decreasing 
exponentially, and we conjecture that the true value of the competitive ratio is 
closer to the lower bound. 
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Abstract. We consider the Unrestricted Bin Packing problem where we 
have bins of equal size and a sequence of items. The goal is to maximize 
the number of items that are packed in the bins by an on-line algorithm. 
We investigate the power of performing admission control on the items, 
i.e., rejecting items while there is enough space to pack them, versus 
behaving fairly, i.e., rejecting an item only when there is not enough 
space to pack it. We show that by performing admission control on the 
items, we get better performance for various measures compared with 
the performance achieved on the fair version of the problem. Our main 
result shows that we can pack 2/3 of the items for sequences in which 
the optimal can pack all the items. 



1 Introduction 

1.1 General 

In this paper, we are investigating the competitive ratio for a bin packing prob- 
lem. However, in addition to considering unrestricted request sequences, we 
also consider some restricted sequences which we refer to as accommodating 
sequences. Informally, these are sequences where an optimal algorithm can sat- 
isfy all requests. Clearly, the competitive ratio on accommodating sequence^ 
is no worse than the competitive ratio on unrestricted sequences for any given 
problem and sometimes can be much better. For problems where the compet- 
itive ratio is a bad measure, it may be useful to compare algorithms by their 
competitive ratio on accommodating sequences. Specifically, it was shown in 
m that there are (benefit) problems where the competitive ratio tends to zero 
while the competitive ratio on accommodating sequences is a constant, i.e., in- 
dependent of the parameters of the problem. Moreover, when we are trying to 
distinguish between two algorithms, the competitive ratio on accommodating 
sequences may prefer one algorithm while the competitive ratio measure (on all 
sequences) prefers the other [51 . 

* Supported in part by the Israel Science Foundation, and by a USA-Israel BSF grant. 

** Supported in part by the Danish Natural Science Research Council (SNF). 

^ In earlier papers gEE], this competitive ratio on accommodating sequences was 
called the accommodating ratio. The change is made here for consistency with com- 
mon practice in the field. 
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In the Bin Packing problem we are given some bins and the goal is to pack a 
set of items into these bins. We concentrate on the benefit variant of the problem, 
where there are n bins and the objective is to maximize the total number of items 
in these bins. This problem has been studied in the off-line setting, starting in 0, 
and its applicability to processor and storage allocation is discussed in 0. (For 
surveys on bin packing, see unE].) 

In the on-line version of the problem the items arrive in some sequence and 
the assignment of an item should be done before the next item arrives. We as- 
sume that the items are integer-sized and the bins all have size k. One can discuss 
the Fair Bin Packing problerrQ where it is required that the packing be fair, that 
is, an item can only be rejected if it cannot fit in any bin at the time when it is 
given. Note that the optimal algorithm is also required to be fair. It is shown in 
0 that for this problem, Worst-Fit has a strictly better competitive ratio than 
First-Fit, while First-Fit has a strictly better competitive ratio than Worst-Fit 
on accommodating sequences. In this case, the competitive ratio on accommo- 
dating sequences seems the more appropriate measure, since it is constant while 
the competitive ratio (on all sequences) is close to zero, for large values of k, ba- 
sically due to some sequences which seem very contrived. This demonstrated the 
usefulness of the more general accommodating function jO] which comprises the 
competitive ratio as well as the competitive ratio on accommodating sequences 
(it is a function of the restriction on the request sequences). 

Here, we consider what happens when the fairness restriction is removed. 
Thus, for the on-line problem Unrestricted Bin Packing (UBP), there are again 
n bins, all of size k, the items are integer-sized, and the goal is to maximize the 
total number of items placed in the bins, but there is no fairness restriction. 

We note that on accommodating sequences, the competitive ratio of UBP 
is no worse than the competitive ratio of the fair problem, since the optimal 
algorithm serves all the requests and hence is fair. In general, however, the 
competitive ratio of UBP is not necessarily better than the competitive ratio 
of the fair problem since the optimal algorithms may be different. In fact, in 
many cases, considering unfair algorithms, i.e., performing admission control on 
the requests, is the more challenging problem; see for example the results for 
throughput routing in P333I- In particular, with the Unrestricted Bin Packing 
problem, it is easier to differentiate between algorithms since both their com- 
petitive ratio and their competitive ratio on accommodating sequences can vary 
over a large range. This is in contrast to on-line algorithms for Fair Bin Packing 
where all of them must have both within a constant factor of each other. 



1.2 Accommodating Sequences and the Accommodating Function 

For completeness, we define the competitive ratio and the accommodating func- 
tion for Unrestricted Bin Packing. Note that Unrestricted Bin Packing is a max- 
imization problem, and all ratios are less than or equal to 1. 

^ In 0 where some of the results from 0 were first presented in a preliminary form, 
this problem was called Unit Price Bin Packing. 
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Let A(7) denote the number of items algorithm A accepts when given request 
sequence I and let OPT(/) denote the number an optimal off-line algorithm, 
OPT, accepts. An on-line algorithm. A, is c- competitive if there exists a constant 
b, such that A(I) > c • OPT(/) — b for all sequences I. The competitive ratio 
CR = sup{c I A is c-competitive}. 

Next, we introduce the restricted request sequences. We say that / is an a- 
sequence, if I could be packed in an bins. We investigate the competitive ratio on 
such restricted sequences. To be precise, an on-line algorithm A is c-competitive 
on a-sequences if c < 1 and there exists a constant b, such that for every a- 
sequence I, A(J) > c- OPT {I) — b. The accommodating function A is defined as 
A{a) = sup{c I A is c-competitive on a-sequences}. 

Thus, the accommodating function for an algorithm is the competitive ratio 
of that algorithm on a-sequences as a function of a. We refer to 1-squences 
as accommodating sequences, since the optimal algorithm can accommodate all 
requests in such a sequence. We use AR to denote the competitive ratio on 
accommodating sequences. 

1.3 Results 

We prove results on the Unrestricted Bin Packing problem for the usual com- 
petitive ratio, the competitive ratio on accommodating sequences and the ac- 
commodating function. We start with the competitive ratio. 

For the usual competitive ratio we prove the following: 

— The algorithm Log (Section has a competitive ratio of 

— No on-line algorithm can have a competitive ratio which is better than 

when considering randomized algorithms. 

— We observe that the competitive ratios of First-Fit and Worst-Fit are 

These results should be compared with the competitive ratio of any on-line 
algorithm for the fair problem: they are all 0(^) 0. 

For the competitive ratio on accommodating sequences we prove: 

— The competitive ratio of Log on accommodating sequences is 0(p^)- 

— We conclude from |S| that the competitive ratio of First-Fit on accommo- 
dating sequences is between | and since the fairness restriction on OPT 
is irrelevant when all of the items can be packed. 

— We design an unrestricted algorithm, Unfair-First-Fit, whose competitive 
ratio on accommodating sequences is |, which is strictly higher than the 
competitive ratio of First-Fit on accommodating sequences. 

— The competitive ratio of any on-line algorithm on accommodating sequences 
is no better than even when considering randomized algorithms. 

Thus, according to the usual competitive ratio. Log is the better algorithm, and 
according to the competitive ratio on accommodating sequences, First-Fit is the 
better algorithm (the same is true for Log and Unfair-First-Fit). 

For the accommodating function we prove the following: 
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— We design randomized and deterministic algorithms for which the accommo- 
dating function evaluated at any constant a is a constant, if the algorithm 
is given the value a. 

— In contrast, we observe that First-Fit’s (and Unfair-First-Fit’s) accommo- 
dating function drops down to 0(^) for a > 1 -I- c, for any constant c > 0. 

The main technical effort is to prove the competitive ratio of the algorithm 
Unfair-First-Fit on accommodating sequences. The other results are easier to 
prove. Algorithm Log uses derandomization of the standard classify and select 
technique. The proof of the lower bound for Log is similar to the lower bound 
proof in |2|, and the proof of the general upper bound for the competitive ratio 
is analogous to the proof of the corresponding lemma in 

Remark: In this paper, we assume that all items are integer-sized and the 
bins have size k. All of the results hold with the weaker assumption that the 
bins are unit-sized and the smallest item has size at least However, some of 
the results in do not appear to hold with this assumption, so we use the 
stronger assumption for consistency. 



2 The Competitive Ratio 

2.1 First-Fit and Worst-Fit 

It is easy to see that the competitive ratio of First-Fit or Worst-Fit for Unre- 
stricted Bin Packing is For the upper bound, consider the sequence consisting 
of n items of size k followed by n ■ k items of size 1. For the lower bound, note 
that if First-Fit (or Worst-Fit) rejects anything, it accepts at least n items, and 
no algorithm can accept more than n ■ k items. From that it follows that First- 
Fit’s (and Worst-Fit’s) accommodating function drops down to ^ for a > 2. 
Moreover, it is 0(^) for a > 1 -|- c, for any constant c > 0, by using (a — l)n • k 
(instead of n • fc) items of size 1. 

2.2 Algorithm Log 

In the description of the algorithm Log, we assume that n > c |"log 2 fc] , for some 
constant c > 1. If n is smaller, we can use simple randomization to achieve the 
same results. 

Log divides the n bins into [log 2 fc] groups Gi,G 2 ,... jGj-iogjfc]- Let p = 
L [logs fc] J let s = n — p • |"log 2 fc] . Groups Gi, G 2 , . . . , Gg consist of p -I- 1 
bins and the rest of the groups consist of p bins. Let S'! = {x | | < x < fc}, and 
S'i = {x I ^ < X < 2^1 , for 2 < i < |"log 2 fc]. When Log receives an item o 
of size So G Si, it decides which group Gj of bins to pack it in by calculating 
j = maxjj < i I there is a bin in Gj that has room for 0 }. If j exists, o is packed 
in Gj according to the First-Fit packing rule. If not, the item o is rejected. 

Theorem 1. The eompetitive ratio of Log is G(jj;^), even on aceommodating 
sequences. 
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Proof. Consider first the lower bound. For i G {1,2,... ,|’log 2 fc]}, let rn{I) 
denote the number of items of size s G Si accepted by OPT when given the 
sequence / of items. Since group Gi is reserved for items of size or smaller, 
the bins in group Gi will receive at least min{2®“^p, ni(/)} items. OPT can 
accept at most 2®n items with sizes in Si, i.e. rii{I) < 2*n. Thus, 2*“^p > 
riofefc] “ 1) > n,(/)( 2 ,|-iog^ Given the same sequence. Log packs at 

least items in Gi, for i G (1, 2, . . . , [loga k '] }. So, for any I, 

^*(-^)( 2 [log 2 fc] ~ 2n) 

Log(/) ig{l, 2 ,...,pog 2 fc]} _ 1 

OPT(I) n^{I) 2[log2fc] 2n’ 

ig{l,2,... ,[log2 fc]} 



SO Ci?Log > 2[log2fc] 2n- 

For the upper bound, consider the sequence / with n items of size k. Then, 

Log(-f) ^ ^ Rog" fc] 1 1 1 

OPT(/) n [log 2 fcl n’ 

so Ai?Log < Since all sequences are considered for the competitive 

ratio, Ci?Log < ^^Log> and the result follows. □ 



2.3 An Upper Bound on the Competitive Ratio 

In this section, we consider an arbitrary on-line algorithm A for Unrestricted 
Bin Packing and prove general bounds on how well it can do. First, note that 
the only possible lower bound on the competitive ratio, even on accommodating 
sequences, is zero, since for the algorithm which simply rejects everything, the 
ratio is equal to zero. 

Clearly, the algorithm Log does not have the best possible competitive ratio 
on accommodating sequences, but its competitive ratio is quite close to optimal. 

Theorem 2. Any deterministic or randomized algorithm for Unrestricted Bin 
Packing has a competitive ratio of less than ^ . 

Proof. Assume that fc is a power of 2. The items are given in phases numbered 
0, 1, . . . ,r, r < log 2 fc. In phase i, n2* items of size fc/2* are given. Clearly, any 
optimal off-line algorithm will accept all n2’’ items in phase r. 

Let Xi be the expected number of items that the on-line algorithm accepts in 
phase i, 0 < i < r, and Xi = 0, r < i < log 2 fc. By the linearity of expectations, 
the expected total number of items accepted by the on-line algorithm is 
and the expected total volume of the items accepted is k2~^Xi. Since 

there are only nfc units of capacity overall, we get: or 




Fair versus Unrestricted Bin Packing 



205 



We now show that r can be chosen such that < To^k’ n^ean- 

ing that OPT will pack more than ^ log 2 k times as many items as the on-line 
algorithm. Defining Sj = 2“-' statement can be reformulated as 

3r G {0, 1) • ■ • )log 2 k} : Sr < f . , which is proven by the following inequality. 

log 2 k log 2 ^ 

^ Sj = ^ 2~^x^ < 2 ■ 2"*a;i < 2n. □ 

j— 0 0<i<j<log2 k 2—0 

3 The Competitive Ratio on Accommodating Sequences 

3.1 An Upper Bound 

Now we turn to the competitive ratio on accommodating sequences. In 0, it 
was shown that for k > 7, any deterministic Fair Bin Packing algorithm has a 
competitive ratio on accommodating sequences of at most |. The same result 
and essentially the same proof hold when the fairness restriction is removed, 
even for randomized algorithms. 

Theorem 3. For k > 7, any deterministic or randomized Unrestricted Bin 
Packing algorithm has a competitive ratio of at most even on accommodating 
sequences. 

Proof. Assume n is even. Consider an arbitrary on-line algorithm A. An adver- 
sary can proceed as follows: Give n items of size |"|] — 1, and let q denote the 
number of bins which contain two items after this. In the case where E[q] < 
the adversary gives ^ long requests of size k. The off-line algorithm can pack 
the first n requests in the first bins and thus accept all ^ items. On average, 
the on-line algorithm places two items in E[q] bins and has at most one item in 
every other bin. The performance ratio is thus at most < |. 

In the case where E[q] > the adversary gives n requests of size [|j -1-1. 
The off-line algorithm can pack the first n items one per bin and thus accept all 
2n items. The on-line algorithm must reject at least E[q] items on average. The 
performance ratio is thus at most < |. □ 

3.2 Unfair-First-Fit 

The Algorithm. In Sectionl^l it was shown that there is an algorithm for Un- 
restricted Bin Packing which has a better competitive ratio than any algorithm 
for Fair Bin Packing. It would be difficult to do the same for the competitive ratio 
on accommodating sequences, since the best upper bound known is | for both 
problems. First-Fit’s competitive ratio on accommodating sequences is known 
to lie between | and ^ and no algorithm for Fair Bin Packing is known 
to have a better competitive ratio on accommodating sequences. The algorithm 
Unfair-First-Fit (UFF), presented below, is shown to have a competitive ratio on 
accommodating sequences which is better than that of First-Fit as long as the 
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number of bins is at least 22; the ratio approaches | as n increases. What makes 
Unfair-First-Fit different from First-Fit is that items larger than | are rejected 
if enough items have been accepted already to maintain the desired ratio of 

Input: 5'= (oi, 02 ,... ,o„) 

Output: A, R, and a packing for those items in A 
A: = {oi}] = S': = tail(S') 

while S {) 

o: = hd(5);S': = tail(S') 
if size(o) > I and > f 

i?: = i? U {o} 

else if there is space for o in some bin 

place o according to the First-Fit rule 

A: = AU{o} 

else 

R: = RU{o} 



The Competitive Ratio on Accommodating Sequences. 

Theorem 4. For n > 9, the competitive ratio of Unfair-First-Fit on accommo- 

2 4 

dating sequences is more than . Thus, for n > 22, ARupp > ARpp. 

3 6n -I- 3 

Proof. The term “large” is used for items strictly larger than since they are 
considered in a special way by the algorithm. Let B denote the set of large items 
that are alone in a bin in UFF’s packing. Let s denote the size of the smallest 
item in R. We divide the proof into two cases depending on the size of s. The 
first case is easy. 

Case 1: s > Since the smallest item in R is larger than the items 
in i? U B are all larger than |. Thus, since all items can be packed in n bins, 

|1?| + \B\ < n, or \R\ < n — \B\. Furthermore, at most one small item can be 

alone in a bin: |A| > 2n — \B\ — 1. Thus, the performance ratio is 

|A| ^ 2n- |B| - 1 ^ 2n - 1 _ 2 1 

|A| -h \R\ - 2n - \B\ - 1 + n - \B\ ~ 3n - 1 “ 3 ~ 9n-3' 

Case 2: s < Since we consider the competitive ratio on accommodating 
sequences, an optimal off-line algorithm, OPT, can pack all items in S. It may 
be instructive to view the optimal packing as being done in 3 phases: 

1. UFF is run on S. 

2. The packed items are rearranged, creating room for the rejected items. 

3. The rejected items are packed. 




Fair versus Unrestricted Bin Packing 



207 



The packing after Phase 1 is denoted by Puff, and the packing after Phase 
3 is denoted by Pqpt- Similarly, Puff and Pqpt are used to denote the total 
empty space after Phase 1 and Phase 3 respectively. We assume without loss of 
generality that no large item is moved during Phase 2. 

We divide the rejected items into two disjoint sets: Rb which contains large 
items, and Rs which contains small items. We use the following equation to 
bound the number of small items rejected. 



\Rs\ < - 



Puff — Pqpt — 2 



It is easy to see that |P| < n , since the empty space in any bin in Puff is 
less than s and all rejected items have size at least s. Thus, if all bins contain 
at least two items each, > 2 n+n ~ I through. Therefore, 

assume that some bins contain only one item. Since the empty space in any bin 
is less than |, such items must be large. Thus, the items that are alone in a bin 
are exactly the items in B. 

It is now clear that |A| > 2n — \B\. However, if some bins contain more 
than two items, this lower bound is too pessimistic. Therefore, we try to “spread 
out” the items a little more. Assume that the items in Puff are labeled with 
consecutive numbers in each bin according to their arrival time, i.e., the first 
item in a bin is labeled 1, the next one is labeled 2, and so on. We split Phase 2 
into two Subphases, 2 A and 2B, such that in Subphase 2A only items with labels 
higher than 2 are moved and in Subphase 2B the remaining moves are performed. 
Note that the packing produced during Subphase 2A is only technical and used 
for counting purposes; it might be illegal in that some bins might contain a total 
volume larger than k. 

If some of the items moved during Subphase 2A are moved to bins containing 
items from B, a better lower bound on |A| can now be obtained (LemmaQ). The 
set of items that are still alone after Subphase 2A is divided into two sets: X, 
containing the items that are still alone after Subphase 2B, and P, containing 
those that are not. Any item that is alone after Subphase 2 A was alone in Puff 
as well. Since no such item can be combined with an item belonging to R, each 
item in X is also alone in Pqpt- Therefore, the bins containing an item from X 
do not contribute to Puff — Pqpt- 



Lemma 1. \A\ >2n— \L\ — | A| . 

Proof. L U A is the set of objects that are alone after Subphase 2 A. □ 

The following easy lemma is used to prove Lemma 01 below which, loosely 
speaking, shows that if we cannot guarantee that most of the bins contain at 
least two items after Subphase 2A, then much of the empty space in Puff is 
used by large rejected items. 

Let t denote the time just after the last large item was accepted by UFF and 
let At denote the set of items accepted at time t. 

Lemma 2. |Pb| > ^\At\ — 1. 
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Proof. Since a large item was accepted just before time t, all items previously 
rejected are large items and therefore contained in i?;,- Since the item was ac- 
cepted, < §• Solving for \Rb\, we get |i?b| > \\At\ — |, and since 

|i?b| must be integer, we get > \\At\ — 1. □ 

Assume that at time t all small items accepted by UFF are marked. 
Lemma 3. \Rb\ > \L\ + ^\X\ — 1. 

Proof. It is shown that \At \ > 2|L| -|- \X\, which will complete the proof, since, 
by Lemma m \Rb\ > ||At| — 1. To each item o G L, & marked item is assigned 
in the following way. Since no item in L is alone after Phase 2, we can assume 
that the bin bg containing o will receive at least one item, o', labeled 1 or 2 
during Phase 2. If o' is marked, it is assigned to o. Otherwise, it must be labeled 
2, since all items labeled 1 in bins before bo are marked. The item which was 
packed below o' in Puff was alone at time t. Therefore, this item is not moved 
to any item in L. This item (labeled 1) can be assigned to o. In this way, every 
item in L has an item assigned which arrived before time t and which is not in 
LU X. Since LUX C At, \At\>2\L\ + \X\. □ 



Subcase 2a: s < |. Since the smallest item in R has size s, the empty space in 
each bin in Puff is smaller than s. Thus, we can use s(n — |Ai|) as an upper 
bound on Puff — Pqpt- 



|Ps| < - ■ ^PuFF - pQPT - g “ l"^l) “ 2 

= n-\X\-^jRb\< n-\X\ - ^|P,|. 

Now, using Lemma 0 we get 

|R| = IK.I + IRil < n - |X| - < n - |X| - 1 (|L| +i\X\- l) 

Thus, 

1^1 ^ 2n-\L\-\X\ 

|A| + |P| - 2n-|P|-|X| + (n-||X|-i|P| + i) 

^ 2n-(|P| + |X|) + | I 

in — ^{\L\ + \X\) + \ in — \{\L\ + \X\) + \ 

^2 2 
- 3 ~ 12n- 3’ 



since \L\ + lA"! < |(n-l- 1), which follows from the fact that the number of large 
items is at most n: n > |Pf,| -I- \L\ + |X| > (|P| -|- ^1X1 — 1) -|- \L\ + |AT| > 
|(|P| + |X|)-1. 
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Subcase 2b: ^ < s < ^. In this case, s{n — |X|) is not a good bound on i?uFF — 
Eqpti but we will show that even in this case, iiluFF ~ ^^opt is “almost” bounded 
by |(n— |X|), if n > 9 and Lemma0|below is used for this purpose. 

Lemma 4. Let m be the number of bins containing at least c items in a First- 
Fit packing. If c> 1 and m > c + 1, then the volume V of the items in these m 
bins is more than 

Proof. Let C denote the set of bins containing at least c items, and, for any bin 
6, let V{b) denote the sum of the sizes of the items in b. 

Suppose, for the sake of contradiction, that V < -^^mk. Then there is a bin 
b G C such that V{b) = p^k — e, £ > 0. The size of any item placed in a bin to 
the right of b must be greater than p^k + £, since otherwise it would fit in b. 
Therefore any bin 6' G C to the right of b has V{b') > p^k + ce > p^k. This 
means that there is only one bin b G C with V{b) < and if b is not the 

rightmost nonempty bin in C, then V > (to — 2) fc + ( A: — e) + ( k-\-ce) > 
m-^^k. Thus, b must be the rightmost nonempty bin in C. 

One of the items in b must have size at most --f^rk— - . Since this item was not 

C+l C 

placed in one of the to— 1 bins to the left of b, these must all be filled to more than 
-^k+l. Thus, V > (TO-l)(+jfc+ 5) + (^/E-e) = m^k-h (m- 1)^ - s > 
m-f-^k + c- — £ = m-f-^k, which is a contradiction. □ 

Assuming n > 9, Lemma 0 combined with Lemma below says that the 
average empty space in bins containing more than one item can be assumed to 
be at most |. 

Lemma 5. Assume that n > 9 and s < |. Then, in Puff> o,t least three bins 
contain two or more items. 

Proof. Assume for the sake of contradiction that fewer than three bins contain 
at least two items. Since s < no bin contains a single item of size at most 
|. Therefore, at least n — 2 bins contain large items, which all arrived before 
time t, i.e.. At > n — 2. By LemmaEl at least I At — 1 large items are rejected. 
Adding these up and noting that there can be at most n large items, we get 
n — 2 + — 1 < n. Solving for n yields n < 8, which is a contradiction. □ 

Our goal is now, roughly speaking, to show that the average empty space in 
all n bins is bounded by approximately |. Number the bins from left to right, 
and let I be the number of the bin in which the last large item was placed. 
Let e denote the largest empty space in bins containing an item from B. In 
the proof of Lemma 0 we will show a lower bound on the number of bins to 
the right of I of approximately +. Each of these bins contains at least two 
items of size larger than e. Thus, even if e > the average empty space in the 
B-bins and the bins to the right of I will be bounded above by approximately 
^|i?| e + (fc — 2e)^^ / 3 |^ = |. Lemma 0 combined with Lemma0 

below says that we can assume that the rest of the bins have an average empty 
space of at most |. 
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Lemma 6. Assume that n>9, s < e> and Then, in Puff 

at least three of the first I bins contain two or more items. 

Proof. We count the total number of items of size larger than e. Since |A| > 
2n — \B\, more than n — ^ items are rejected, because otherwise we have a 
performance ratio of which is a contradiction. After bin I, there are n — I bins 
containing at least two items each. All of the rejected items and those in the last 
n—l bins are larger than e and there are more than n— ^ + 2(n— /) of them. Bins 
containing items from B cannot accept any of these items, and only two can be 
put together since e > |. Thus, n — ^ + 2{n — l)< 2(n — \B\). Solving for I, we 
get I > ^ + I |i?|. This shows that at least f — ^ bins to the left of I contain two 

or more items. By LemmaEl \B\ < n — 3. Thus, f — ^ > § — > 3, 

since n > 9. □ 

Lemma 7. Assume thatn > 9, s < |, and < |. Then, Euff~ Eqpt < 

(n-|X|)| + |. 

Proof. In the case where e < we have an upper bound of | on the average 
empty space in bins with one item as well as bins with more items. Thus, ifuFF~ 
L'Opt < • Now, assume that e > |. First we show an upper bound on 1. 

At time t no two bins can contain only one small item each. Therefore, \ At\ > 21— 
|i?| — 1. The total number of large items is |i?f,| + |i?| > — l+|i?| > — |. 

Since OPT must pack all these items in separate bins, we have ^ ^ < n. 

Define z >0 such thatn— / = — |. Since every bin after bin I has two items 

of size greater than e, we have the following upper bound on the empty space in 
these n—l bins and the bins with an item from B\X: e(|B| — |X|)+(fc— 2e)(n— Z) = 
e|B|-e|A:| + (fc-2e)(z+^-|) < e|B| - f |X| + (fc-2e)L|l + (fc- 2e)(z- f ) = 
^ - ||X| + (fc - 2e)(z - f) < - ||X| + (fc - 2e)z < ^ - ||X| + |z. 

Among the remaining bins, I — \B\ = n — z — + | bins do not contain 

an item from X. All of these bins have at least two items, and according to 
Lemma 0 enough of these bins exist for us to conclude, by Lemma 0 that the 
empty space is at most |(n — z — + §)• The total empty space is then less 

than ^-||X| + |z+|(n-z-^ + §) = (n-|X| + f)|. □ 

Then, by Lemma 0 if n > 9, 

\Rs\ < - ■ (^Euff - Eqpt ~ 2 ~ + 2 ~ 2 

<n-\X\ + ^-^\R,\. 

Using Lemma0as in Subcase 2a, we get 

\R\<n-\X\ + ^- ^{\L\ + ^\X\-l)=n-l\X\-^\L\ + 2, for n > 9. 
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Thus, 



|A| + \R\ 

> 



This bound is lower than the lower bounds obtained in Case 1 and Subcase 2a 
for all n. It is shown in 0 that ^ is an upper bound on FF’s competitive ratio 
on accommodating sequences. For n > 22, | Thus, for n > 22, 

UFF has a better competitive ratio than FF on accommodating sequences. □ 

Remark: It is easy to see that UFF’s competitive ratio is If it is less than 
|, then R is nonempty, so at least n items are accepted. OPT can accept at 
most nk items, so the competitive ratio is at least For the upper bound, if 
|n items of size k followed by nk items of size 1 are given, UFF will accept n 
items of size k, while OPT will accept all of the small ones, giving a ratio of 
Note that this means that ^uff(«) = for a > |. Furthermore, if 2n items of 
size I are given, followed by {a — l)nk items of size 1, UFF will accept 2n items 
of size |, while OPT can accept 2n + (a — l)n{k — 2) items, giving a ratio of 
2+(a-i)(fc-2) • Thus, for any constant c > 0, ^uff(q;) € 0 (^), if a > 1 + c. 

4 The Accommodating Function 

Suppose that, for each sequence / of items, the on-line algorithm knows, before- 
hand, the number an of bins needed to pack the items in / (or a good upper 
bound on a). Then an accommodating function can be achieved for which the 
function value is constant (that is, independent of k and n) when evaluated at 
a constant a. 



2n- \L\ - |X| 

2n-\L\-\\X\-\ + 2 

2n-(|£| + |X|) + | I 

in~l{\L\ + \X\) + 2 in-\{\L\ + \X\) + 2 



4.1 A Randomized Algorithm 

One way of exploiting the extra knowledge is to use an “virtual” bins. At the 
beginning the randomized algorithm R randomly decides which n of the an 
virtual bins are going to correspond to the “real” n bins. Call the set of these 
n virtual bins and the rest of the an virtual bins An algorithm A with 
a “good” competitive ratio on accommodating sequences ARp^ is used to decide 
where the actual items would be packed in the an virtual bins. When A packs 
an item in a bin in Ba, the algorithm M accepts the item and places it in the 
corresponding real bin. All other items are rejected. 

The expected fraction of the items which M accepts is at least ^ since on 
average ^ of fh® items accepted by A will be packed in Ba- 
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Using Unfair-First-Fit, this gives A{a) > ^ (asymptotically), which is constant 
when a is. 

Another way of using virtual bins is to use an algorithm that is known to be 
able to pack any 1-sequence of items in (3n bins for some constant j3. In this case, 
afin virtual bins are used. According to 0, for the algorithm Harmonic-|-l, j3 < 
1.588720. Using Harmonic -1-1 for packing items in the virtual bins and randomly 
choosing the n bins for Ba gives A{a) > ^ ~ According to 0, even 

for randomized algorithms, f3 > 1.536. Since ^ ~ 0.651, this approach cannot 

give an accommodating function as good as the method described above using 
an virtual bins can. 

Remark: Amos Fiat m has noted that the technique described above can be 
used more generally, for many maximization problems, to give good values for 
the accommodating function when a is small. If an algorithm A with competitive 
ratio on accommodating sequences ARa is used with a quantity an of the virtual 
resource, and a quantity n of these virtual resources are randomly chosen and 
used on the real resources, then the algorithm will achieve an accommodating 
function of A{a) > 



4.2 A Deterministic Algorithm 

It is also possible for a deterministic algorithm to have an accommodating func- 
tion such that the function value of the accommodating function is constant 
(that is, independent of k and n) when evaluated at a constant a as long as 
n> 5. The following algorithm D has this property. 

D divides the possible item sizes into [log 2 k~\ intervals, 81 , 82 , ■■ ■ , <S'|-iog 2 k], 
defined by S'! = {a; | ^ < x < k}, and = {x | ^ < a; < jct}, for 2 < i < 
[log 2 k~\ . Thus, for any two items with sizes Sq and belonging to the same size 
interval, Sa> |sh. 

For each i, 1 < i < [log 2 k~\, D does the following. It accepts the first item 
with size s € 8 i. After that it accepts every ^th item with size s G 8 i, for a 
given constant (3, and rejects all other items with sizes in 8 i. The accepted items 
are packed according to the First-Fit packing rule and the constant (3 will be 
chosen as described below, so that ID) has no problem doing so. Since D accepts 
every ^th item in each size interval, A{a) > 

Let O be the set of all the items given, let Op be the set of items consisting 
of the first item in each size interval and let O' — 0\ Op. Let A be the set of 
items accepted by D and let A' = A\Op. For any set 8 of items, let the volume 
of 8 , denoted by U(S'), be the sum of the sizes of the items in 8 . 

It follows from Lemma0that the volume of the items in any First-Fit packing 
using n bins is more than Thus, if j3 is chosen such that V{A) < ^, ID) will 
be able to pack all the accepted items. 

To determine an appropriate value for f3, first notice that V{0') < V{0) < 
ank, since all the items can fit in an bins, and V{0') > |^U(A'), since for 
every item o G A' , ^ — 1 items, each of size s > ^size(o), have been rejected. 
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Combining these inequalities gives \ ^V{A') < ank, and solving for V{A') yields 
V{A') < 2f3nk. 

[log 2 fe]-l ^ °° k 

Furthermore, U(OiT’) < ^ ^ 

z— 0 i— 0 

We now have that V (A) = V {A') + V{Of) < 2f3nk + 2k. To obtain 2f3nk + 
2k < n must be at least 5, for any /? > 0. For n > 5, f3 = ^ assures that 
V(A) < If we accept that n must be at least 10, then P = ^ can be used. 
Thus, if n > 5, ^(a) > and if n > 10, A{a) > 
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Abstract. In this paper we consider the following problem. Given is 
a d-claw free graph G = (V, E,w) where w : V — > R+. Our algorithm 
finds an independent set A such that w(A*)/w(A) < d/2 where A* is an 
independent that maximizes w(A*). The previous best polynomial time 
approximation algorithm obtained w(A*)/w(A) < 2d/3. 



1 Introduction 

In an undirected graph a d-claw C is an induced subgraph that consists of 
an independent set Tc of d nodes, called talons, and the center node that is 
connected to all the talons. A graph is d-claw free if it possesses no d-claws. 
For convenience, we define 1-claw to be a singleton set C with Tc = C. We also 
define the center set of a claw C as Zc = C — Tc. 

The d-claw free graphs are studied for two reasons. One is that these graphs 
appear in many applications. In particular, we often consider graphs in which 
nodes the set of nodes is a family of sets, and edges indicate non-empty set 
intersections. If sets in the family have less than d elements, the graph are d- 
claw free. Other examples include families of oriented squares of unit size, which 
form 5-claw free graphs, and families of unit size circles which form 7-claw free 
graphs. 

Another reason is that d-claw free graphs form the broadest natural family 
of graphs where algorithms for the Maximum Independent Set problem (MIS 
for short) have constant approximation ratio. Even this very simple algorithm 
assures ratio d — 1 (we always assume that (V, E) is the input graph): 

definition 

N (K, L) = ju G L : 3v G K such that {u, v} G E or u = v) 

Greedy 

A ^ 0 

while V-N(A,V) 0 
choose u G V — N (A, V) 

A <- A U {u} 
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One can generalize MIS problem by introducing a weight function w : V — > R+. 
The objective of w-MIS is to find an independent set A with maximum w(A). 
The above Greedy algorithm achieves the same approximation ratio for w-MIS, 
once we change the greedy selection as follows: 

choose u G V — N (A, V) with the maximum w(u) 

An obvious challenge is to find polynomial time algorithms with better ap- 
proximation. A natural idea is to apply small improvements. We say that a node 
set C improves w(A) if w(A — N(C, A) U C) > w(A). The following algorithm 
approximates MIS for d-claw free graph with ratio d/2 (see inSBl): 

SizeTwoImp 
A ^ 0 

while there exists {u,v} that improves |A| 

A <- A-N({u,v},A) U{u,v} 

By increasing the size of allowed improvements, one can obtain polynomial 
time algorithms with ratios approaching (d — 1 )/2 m- However, it was not ob- 
vious how to extend this idea to w-MIS in d-claw free graphs. Recently, Chandra 
and Halldorsson ISHl have found that the following algorithm has ratio % cfl: 

BestImp 

A^0 

while there exists claw C such that T^ improves w(A) 
if V-N(A,V) ^0 

choose u G V — N (A, V) with maximum w(u), C <— {u} 
else 

choose claw C that maximizes w(Tc)/w(N(Tc, A)) 

A i — A — N(T(;;,A] UT(^ 

Chandra and Halldorsson show how to modify BestImp so it runs in poly- 
nomial time: (i) find an approximate solution A using Greedy; (ii) rescale the 
weight function so that w(A) = 1c|V|; (iii) run the algorithm BestImp for the 
weight function [wj . Because each iteration increases [wj (A) by at least 1, and 
we cannot get [wJ (A) > w(A*), there are fewer than (d — 1)b|V| iterations. In 
turn, in each iteration we inspect only a polynomial number of candidates for the 
claw C. This assures that the new algorithm runs in polynomial time. Moreover, 
the solution of the new algorithm satisfies w(A) > [wJ (A) > [wJ (A*]/( 2/3 d) > 
w(A*)/( k/]^_l X 2/3 d), thus the approximation ratio increases by factor k/j^_i. 

In this paper, we analyze the following algorithm and show that it provides 
the same approximation ratio for w-MIS as SizeTwoImp for MIS: 

SquareImp 
A ^ 0 

while there exists claw C such that Tc improves w^(A) 

A i — A — N(T(;;,A] UT(^ 



^ The analysis of Chandra and Halldorsson holds only for the graphs formed from 
sets with fewer than d elements. 
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2 Analysis of SquareImp 



One can extend the analysis of the running time of BestImp to SquareImp. 
While it does not have a polynomial bound on the running time, we can modify 
this algorithm that it runs in time 0(k^p(|V|)) for some fixed polynomial p, while 
the increasing the approximation ratio by Vk-1 factor. The only difference is that 
we have a different estimate on the number of iterations. In particular, after we 
rescale w in step (ii), we have w(A) = k|V| and consequently w^(A) < k^|Vp; 
because Greedy for w is also Greedy for w^, this implies that w^(A*) < 
(d — 1]k^|Vp; thus our estimate on the number of iterations of the modified 
SquareImp is higher then the estimate for the modified BestImp by k|V| factor. 

To see that the approximation ratio of SquareImp is at least d/2, we may 
construct a small example for each d, in which w(u) = 1 for every node. The 
set of nodes in this example is a union of two independent sets, A and B. Set 
A, has d — 1 elements. Set B consists of subsets of A with 1 or 2 elements; thus 
|B| = (d-1)(d-2)/2 + d-1 = (d-1)d/2 = d/ 2 |A|. For u G A and v G B, {u,v} 
is an edge if and only if u G v. Algorithm SquareImp may start by picking, 
one by one, elements of set A. It is easy to see that subsequently SquareImp 
terminates because no claw improves |A|. 

The above example show also that we cannot improve SquareImp by re- 
placing with some other w‘^. 

To show that the approximation ratio is at most d/2, we will start from the 
analysis of algorithm WishfulThinking. The name of this algorithm comes 
from the fact that it is quite obvious that it delivers the desired approximation 
ratio, however this claim holds under the assumption that it terminates. Later, 
instead of analyzing the running time of WishfulThinking directly, we will 
show that it cannot make more iterations than SquareImp while SquareImp 
cannot have a larger approximation ratio. 



charge(u,v) = 



definition 

N(u,A)=N({u},A) 

n(u) is a node v G N(u, A) with the maximum value of w(v) 
w(u) — 1/2 w(N(u, A)) if V = n(u) 

0 otherwise 
C is a good claw if either N (C, A) =0 or 

Zc ={v} C A and ch,arge(u,v) > l/ 2 w(v) 

C is a nice claw it is a minimal set that is a good claw 
WishfulThinking 
Ag-0 

while there exists a nice claw C 
A i — A — N(Tc,A] UT(^ 



Lemma 1. Assume that WishfulThinking has terminated and that A* is an 
independent set. Then w(A*)/w(A) < d/2. 

Proof. We will distribute w(A*) among the nodes of A in such a way that no 
node V G A receives more than I /2 dw(v). The distribution consists of two steps. 
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In the first, u G A* sends to each v G N(u, A) a portion of its weight equal to 
1/2 w(v). Note that N(u, A) is non-empty, otherwise {u} is a nice claw. Also, in 
this step u sends a portion of its weight equal to l/ 2 w(N(u, A)), consequently 
the portion of its weight that is not distributed yet equals charge(u, n(u)). In 
the second step u sends charge(u,n(u)) to tl(u). 

On the receiving side, in the first step a node v G A gets from every 

neighbor in A*, and there are at most d — 1 of them (otherwise they would form 
talons of a d-claw with center {v}). Thus v gets at most ( d /2 — I /2 )w(v) in the 
first step. Moreover, v gets at most V2''^(''^) in th® second step, otherwise the 
nodes that send positive charges to v form talons of a good claw, and such a 
claw cannot exists when WishfulThinking terminates. 

While the goal of WishfulThinking algorithm is the maximization of w(A), 
an iteration may actually decrease w(A). Consider S = {vq,...V 4 } C A and 
T = {ui ,U 2 } C V — A and make the following assumptions: 

(a) n(u) = Vo for u G T, 

(b) w(u) = 18 for u G T, 

(c) N(ui,A) ={vo,V 2 i-i,V 2 i} for i= 1,2, and 

(d) w(v) = 10 for V G S. 

One can see that charge(v,uo) = 3 for v G T and that 3 -f 3 > V2i0, thus 
T U {uo} is a nice claw. If we apply this claw to perform an iteration of Wish- 
fulThinking, A changes into A — S U T and w(A) decreases by 12. 

Because WishfulThinking can alternate between increasing and decreasing 
w(A) we need the following lemma to show that it actually terminates. 

Lemma 2. If C is a nice claw, then Tc improves w^(A). 

Proof. We will use T to denote Tc. We need to show that w^(A — N (T, A) UT) > 
w^(A). 

Consider first the case when N(T, A) = 0. In this case A — N(T, A)UT = AUT 
and the claim is obvious. 

In the remaining case Zq = {v} C A. We will develop a condition that implies 
that w^(A — 1M(T, A) U T)) > w^(A), and then we will show that if C is nice, 
then T satisfies this condition. 

By subtracting w^(A — N(T, A)) from both sides of w^(A — N(T, A) U T] > 
w^(A) we get an equivalent inequality 

w2(T) > w2(N(T,A)) (1) 

Observe that 

N(T,A)=MU y N(u,A-M) 

uGT 

and therefore m is implied by 

^ w^(u) > w^(v) -I- ^ w^(N(u, A — {v})) = 

ugt ugt 

^ w^(u) — w^(N(u, A — {v})) > w^(v) = 

ueT 
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z 



w2(u) 



ueT 



w^(N(u, A — {v})) 
w(v) 



> w(v) 



( 2 ) 



Now we will show that m holds if TU{v} is a nice claw. Under this assumption, 
T is a minimal set such that 



Y charge(u,v) > l/ 2 w(v) (3) 

u-eT 



Because set T is minimal, every term on the left-hand side of (0 is positive, and 
in particular, n(u) = v. Thus to show 0 it suffices to show that 



w^(u)-w^(N(u,A-M)) 

w(v) 



> 2 X charge(u, v) 



holds if ch.arge(u, v) > 0. By the definition of charge, this is true if 
w^(u) — w^(N(u, A — {v})) 



w(v) 



> 2w(u) — w(N(u, A)) 



( 4 ) 

( 5 ) 



holds whenever v is an element of N (u. A) with the maximum weight and 
2w(u) >w(N(u, A)]. 

If we replace the weight function w with cw, then both sides of 0 will 
be multiplied by c and this is an equivalent transformation. Therefore we may 
assume, for the ease of calculations, that w(N(u, A)) = 2. Because 2w(u) > 2, 
we may assume that for some x > 0 we have w(u] = 1 -f x. In the proof of 0 
we consider two cases. 

Case 1. w(v) = 1 -1- y for some y > 0. Then w(N (u, A — {v})) = 1 — y, hence 
w^(N(u, A — {v})) < (1 — y)^. Therefore 0 is implied by 



- ' ^>2x = 1 -h2x-Fx^ - 1 -H2y -y^ > 2x-f2xy = 

1 +V 

x^+2y>y^+2xy = x^ - 2xy -f y^ > 2y^ - 2y = (x -y)^ > -2y(1 -y). 

Because 0 < y < 1 , the last inequality is obvious. 

Case 2. w(v) = 1 — y for some y > 0. Then w(N(u, A — {v})) = 1 -f y, while 
the largest weight in N(u, A — {v}) is at most 1 — y, hence w^(N(u, A — {v})) < 
(1 -|-y)(l — y) = 1 — y^. Therefore © is implied by 



n +x)2-(1 -y^) 
1 -y 



>2x = 



1 -H2x-Fx^ - 1 -f y^ > 2x-2xy ee x^ y -f 2 > -2xy . 

Again, the last inequality is obvious. 

Lemma 2 allows us to relate algorithms WishfulThinking and SquareImp. 
Because each nice claw improves w^(A), a run of WishfulThinking forms 
the initial part of a run of SquareImp. Consequently, the number of iteration 
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performed by WishfulThinking is at most as large as the number of iterations 
of SquareImp. In turn, when SquareImp terminates, we obtain a candidate 
set A for which no claw improves w^(A), hence no nice claw may exists, hence 
this candidate independent set satisfies the assumption of Lemma 1 . 

If we compare the virtues of these two algorithms, SquareImp has a more 
succinct formulation, while WishfulThinking is more efficient: the searching 
space is smaller when we seek a nice claw than when we seek a claw that improves 
w^(A), therefore the time needed to perform an iteration is smaller in the case 
of WishfulThinking. 

To see that the searching space of WishfulThinking is indeed smaller, note 
that we can approach it as follows. Given the current candidate A, for every node 
u G Va can can compute n(u) and ch.arge(u, n(u)). Then for a given v G A we 
need to inspect independent sets contained in (v) (possible sets of talons). 
Moreover, we exclude the nodes that do not have a positive charge, and when 
we evaluate a possible set of talons, we consider only its sum of charges, and we 
do not need to compute its set of neighbors in A. 

Without going into details of the running time analysis we can formulate the 
following theorem: 

Theorem 1. For every d there exists an algorithm that given a d- claw free graph 
with n nodes and k > 1 , finds a solution to w-MIS problem with approximation 
ratio 1/^ d in time that is polynomial in kn. 
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Abstract. This paper presents approximation algorithms for two prob- 
lems. First, a randomized algorithm guaranteeing approximation ratio 
■y/n with high probability is proposed for the Max-Rep problem of IKor m, 
or the Label-CoverMAx problem (cf. |Hoc95) 1. where n is the number of 
vertices in the graph. This algorithm is then generalized into a 4y^- 
ratio algorithm for the nonuniform version of the problem. Secondly, it 
is shown that the Red-Blue Set Cover problem of l^iDKMonl can be ap- 
proximated with ratio 2-^n log /3, where n is the number of sets and [3 
is the number of blue elements. Both algorithms can be adapted to the 
weighted variants of the respective problems, yielding the same approx- 
imation ratios. 



1 Introduction 

1.1 Background 

Recent classifications of NP-hard problems by their approximability properties 
have led to the identification of a group of problems termed class III prob- 
lems in These problems can be informally characterized as ones known 

to have no approximation algorithm with ratio (for any 0 < e < 1) 

under some plausible complexity-theoretic assumption (such as NP yf P or 
NP 2 "■))). We henceforth refer to this property as strong 

inapproximahility. This class includes problems such as the minimization and 
maximization versions of Label-Cover |ABSS93j . AND/OR Scheduling |(IM97j . 
Minimum- Monotone- Satisfying- Assignment (MMSA) jAIjJVlP9'H) . Min-Rep and 
Max-Rep |Kor98j . Red-Blue Set Cover |CDKM00j . and more. 

While negative (strong inapproximahility) results are known for all of those 
problems (and indeed, in a certain sense they define the class), less is known 
about positive (approximability) results. The current paper is concerned with 
providing such results for some of the above problems. 
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1.2 The Problems Considered 

Max-Rep, Label- Cover jvf ax and related problems. The Max-Rep prob- 
lem is defined in |Kor98) as follows. We are given a bipartite graph G{U, W, E), 
where U and W are each split into a disjoint union of k sets, U = 
and W = Ui=i The sets Aj, all have size m. Let A = {Ai,...,Afe} 
and B = An instance of the problem consists of the 5-tuple 

(U,W,E,A,B). The bipartite graph G and the partitions A and B of U and 
W induce a bipartite super-graph H = {A,B,E-u)- Two super- vertices Ai and 
Bj are adjacent in B. iff there exist some u G Ai and w G Bj which are adjacent 
in G. 

A set of vertices C C {/ U W is said to cover the super-edge {Ai,Bj) if it 
contains a pair of vertices u, w such that u G Ai^ w G Bj and (m, w) G E. The set 
C is a legal cover for B if it contains at most one vertex from each super-vertex. 
It is required to select a legal cover G for B covering the maximum number of 
super-edges possible. 

A minimization version of this problem, called Min-Rep, is also introduced 
in EEna. In this version, a cover G must cover every super-edge, but it may 
contain any number of vertices from each super-vertex, and the goal is to select 
a minimum size cover G for B. 

A closely related problem is the Label- Cover prohleva., introduced in lAIjSSDd] 
and presented in |Hoc95| as one of six cannonical problems for proving hardness 
of approximation. This problem has minimization and maximization versions 
called Label-CoverM/x and Label-CoverMAX, which can be represented as vari- 
ants of the Min-Rep and Max-Rep problems respectively, except that the notion 
of super-edge coverage is slightly different. Namely, a super-edge {Ai,Bj) is said 
to be covered if for every vertex u G Ai (1 C there is a vertex w G Bj C\ C such 
that (u, w) G E. Note that the Label-Covei'MAX problem is equivalent to the 
Max-Rep problem. 

The Red-Blue Set Cover Problem. The Red-Blue Set Cover problem was 
introduced in |(JDKlVI(in) . It is a natural generalization of the set-cover problem, 
defined as follows. Consider a finite universe partitioned into two disjoint sets, 
U = Ril B, where R = {ri, . . . , rp} is a set of red elements and B — {bi, . . . , b/s} 
is a set of blue elements. We are given a collection of sets over the universe U, 
S = {^i, S' 2 , «5„}. For any subcollection S' C S, let 

U{S') = U S^, B{S') = U{S')nB, R{S') = U{S')nR. 

SiGS' 

The goal is to choose a subcollection S' of S that covers all the elements of B 
(i.e., s.t. B C B{S')) while minimizing |i?(5')|, the number of red elements in 
S'. 

The Red-Blue Set Cover problem is also shown in fCDKMOOj to be equivalent 
to MMSA 3 , the third level of the Mimmum-Monotone-Satisfying-Assignment 
problem introduced in [ABMP9^ . 
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1.3 Previous Work 

The Label-Cover problem was introduced in [AT3SS93] . where it was also shown 
to be strongly inapproximable. More precisely, it was shown that it is quasi- 
NP-hard to approximate the problem with ratio 2*°s for any constant 0 < 
e < 1 (or in other words, such approximation is impossible unless NP C 
”))). This result was recently improved in jl)S99| . by weak- 
ening the complexity-theoretic assumption to NP yf P and allowing e to be as 
small as loglog~°n for any c < 1/2. 

The Max-Rep and Min-Rep problems were introduced in EEHEl, where it 
was also shown that both are strongly inapproximable. 

The strong inapproximability of the Red-Blue Set Cover problem was shown 
independently in |CUKM(?n] and |L1P()()| . Applications of the Red-Blue Set Cover 
problem in a variety of domains, such as data mining applications, information 
retrieval or general machine learning and classification, are discussed at length 
in ICnkMbbl . as well as a number of special-case variants of the problem and 
related problems, including Set Cover, Group Steiner and Directed Steiner, and 
Minimum Color Path. 

Few positive results exist for the above problems. The Red-Blue Set Cover 
problem admits naive approximation algorithms with ratios j3, p, or n log [3. A 
number of better approximation algorithms are given in fCOKMODj for this prob- 
lem. Specifically, letting ks (respectively, kn) denote the maximum number of 
blue (resp., red) elements in any of the sets St, the paper presents approximation 
algorithms with ratio 2-v/fcs • n or logn). Hence these algorithms are 

efficient when kg or kn are small, but their approximation ratio may be as high 
as i7{y/nP) or C(nlogn), respectively, in the general case. 

In [EPOOj it is shown that the Min-Rep and Label-CoverM/Ar problems ad- 
mit a -yn-approximation ratio. It is also shown that the Min-Rep and Label- 
Covei MI N problems restricted to the cases where the girth of the induced super- 
graph is greater than t, admit an approximation ratio. In particular, it fol- 
lows that the Min-Rep and the Label-CoverM/w problems with girth greater 
than log*^ n (for some constant e > 0) are not strongly inapproximable. 

1.4 Contributions 

The current paper presents approximation algorithms for two of the above prob- 
lems. 

The first algorithm, presented in Section El is a randomized algorithm guar- 
anteeing approximation ratio y/n with high probability for the Max- Rep problem 
(or for Label-Covei'MAJs:)- (A simple deterministic variant was recently pointed 
out by Y. Hassin |iMl .) This algorithm is then generalized into a 4-yn-ratio 
algorithm for the nonuniform version of the problem, in which there may be a 
different number of sets in A and B, and these sets may have different sizes. The 
algorithm can be generalized also to the weighted version of the problem, where 
super-edges have real nonnegative weights, and the goal is to maximize the total 
weight of covered super-edges. 
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In Section 0 we present an algorithm with approximation ratio 2\/ n log 
for the Red-Blue Set Cover problem. The algorithm can be generalized also to 
the weighted version of the problem, in which every red element ri G R has a 
positive real weight associated with it, and the goal is to minimize the weight of 
the selected cover. 



2 An Approximation Algorithm 
for the Max-Rep Problem 

2.1 The Uniform Case 

Let us start with some terminology. For every 1 < i < k, let Ai = {u \, . . . , m™} 
and Bi = {u>|, . . . , w'"}. We think of the graph as drawn with the vertices of U 
on the left and the vertices of W on the right. 

Consider a cover C. Without loss of generality we may assume that C = 
U , where (respectively, W'^) contains exactly one vertex uf (resp., 
wf) in each super- vertex Ai (resp., Bi) on the left (resp., right). In particular, 
we denote by C* = U* U W* the optimal solution to the problem, and let u* 
(resp., w*) denote the unique vertex of U* fl Ai (resp., W* C\ Bi). 

For vertex subsets U' CU and W C W, let G{U' , W) denote the subgraph 
of G induced by U' and W . For a vertex u G U' (resp., w € IF'), let deg(w, IF') 
(resp., deg(w, U')) denote its degree in G{U' , IF'). For u G A[, let B{u, %) denote 
the set of super-vertices Bi neighboring u, namely, such that there is an edge 
(u,Wi) G E for some wj G Bi. Let sdeg{u,'H) denote the super-degree of u, 
namely, the cardinality of riu^R). The super-degree of a vertex represents the 
number of super-edges it can potentially cover. 

For any cover C, let E{C) denote the set of super-edges of H covered by C. 
Note that these are precisely the super-edges corresponding to the edges of the 
graph G{U'" ,W^). Denote the cardinality of this set by /(C) = \E{G)\. 

We first present two approximation procedures for the problem. The first of 
the two has approximation ratio fc, so it applies well in case there are few sets. 

Procedure Few_Sets. 

1. Calculate the super-degree sdeg{u,'H) of every vertex u G U. 

2. Find the vertex u G U with maximum super-degree. 

3. Construct a set IF consisting of one neighbor wj' of u in every super- vertex 

Bi G r{u, %). 

4. Complete the set IF U {u} into a cover C arbitrarily. 

5. Output the cover C. 



Lemma 1. Procedure Few_Sets yields a k- approximation for the Max-Rep 
problem. 
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Proof. By the choice of u, sdeg{u* ,H) < f{{u}UW) < f{C) for every 1 < i < k. 
Subsequently, 



f{C*) < Y. sdeg«,7^) < k-f{C). I 

l<i<k 

Our second approximation procedure has approximation ratio 2m, so it ap- 
plies well in case the sets are small. 

Procedure Small_Sets. 

1. For every 1 < i < k, draw a vertex iii G Ai uniformly at random. 

2. Let il = {ui,---,Uk}. 

3. For every 1 < i < k do: 

(a) Compute the degree deg{w^, U) for every vertex wf G Wi. 

(b) Let Wi be the vertex with maximum degree. 

4. Let W = {ici, • • • , Wk}. 

5. Output the cover C = {tj, W). 

For the analysis, we need to argue that the cover C = {U, W) constructed by 
Procedure Small_Sets is not much worse than the optimal cover C* = U*AW* . 
To do that, let us first consider the intermediary “mixed” cover C = {U, W*). 

By the choice of W , it is clear that once U is fixed, W* is no better than W. 
Hence comparing C to C, the following claim is immediate. 

Lemma 2. f{C) > f{C). | 



On the other hand, comparing C to C* we have: 

Lemma 3. ]E(/(C)) > ^ • /(C*). 

Proof Let di = deg(ui, VF*) and d* = deg(w*, W*). Observe that if Ui = v* then 
di = d*. As this happens with probability 1/m, we have that E((ii) > ^ ■ d*. 

Noting that f{C*) = J2i=i d* and f{C) = J2i=i we conclude that 

k 1 ^ 1 

nnc)) = Ew) ^ -E< = --/(^*)- i 

2=1 2=1 

Corollary 1. E(/(C)) > A • /(C*). | 



To get this result with high probability, we apply the following procedure. 
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Procedure Small_Sets_2. 

1. Set i = 2m log n. 

2. Invoke Procedure Small_Sets for I times. 

3. Select the best result. 

We rely on the following elementary fact. 

Lemma 4. If X is a random variable in the range [0, ma] with expectation 
E(A1) = a, then the probability that X < aj2 is at most 1 — 

By applying the above fact to the random variable f{C) with a = f{C*)/m, 
we get that in each invocation of Procedure Small.Sets, the probability that 
the gain of the resulting cover C is f{C) < f{C*)/2m is at most 1 — Subse- 
quently, the probability that the gain of none of the £ covers exceeds f {C*)/2m 
is at most (1 — 5 ^)^ ~ 1 /n. 

Corollary 2. With probability at least 1 — 1/n, Procedure Small_Sets_2 yields 
a 2m- approximation for the Max-Rep problem. | 

Finally, by combining Lemma ^ and Corollary 0 we conclude that applying 
both procedures Few.Sets and Small_Sets_2 and selecting the better result 
yields an approximation ratio of min{fc,2m}. As n = 2km, either k < y/ri or 
2m < y/n must hold, hence we have the following. 

Theorem 1. There is a randomized algorithm yielding an approximation with 
ratio yfn with probability at least 1 — 1/n for the Max-Rep problem. | 

Let us remark that a simple deterministic variant of Procedure Small_Sets, 
and hence of the entire algorithm, was recently pointed out by Y. Hassin fHasOOj. 

2.2 The Nonuniform Case 

Let us now generalize the approximation algorithm to the case where the par- 
titioning of the graph is nonuniform, i.e., there are ku sets Ai and kw sets Bi, 
and each of those sets is possibly of different size. 

It is easy to verify that the procedures described earlier still work correctly, 
albeit with weaker approximation ratios. In particular. Procedure Few_Sets 
will guarantee approximation ratio at most kjj. Analogously, a dual procedure 
Few_Sets_2 which reverses the roles of the sets U and W (i.e., selects the best 
vertex w £ W and bases the cover on w and its neighbors in U) will yield 
ratio kw- Procedure Small_Sets_2 will guarantee approximation ratio at most 
m(TL) — max{|Ai| | 1 < i < ku}. Unfortunately, in a nonumiform instance, all 
of these bounds might be simultaneously as large as I7(n). 

To get a better bound, we partition the problem into four subproblems as 
follows. First, split the sets Ai and Bi into large and small ones, letting 

Al = {Ai I \Ai\ > y/n}, As = {Ai | \Ai\ < y/n], 

Bl = {B, I m > Vn], Bs = {B, \ \B,\ < Vn], 
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and taking 



Ul = ^Ai^AL^^^ Us = U\Ul, 

WL = UB,^Br.B^, Vks = W\Wl. 

The edge set E is partitioned accordingly into four subsets 

Exy = EtlUx xWy , for X,Y € {L, S}. 

The problem now splits into four subproblems, denoted EIxy for X,Y G {L,S}, 
where Uxy is defined over the graph Gxy = {Ux,Wy, Exy) and the super- 
graph Hxy induced by the super- vertices of Ax and By- 

Clearly, each cover C for the original problem induces four covers Cxy for 
the subproblems, with 

/(C) = /(Cll) + f{CLs) + f(CsL) + fiCss)- 

Subsequently, if it is possible to approximate each of the four subproblems IIxy 
for X,Y G {els'} separately, giving it a cover Cxy with ratio at most 7, then 
we can guarantee an approximation for the original problem with ratio at most 
47, simply by taking the largest of the resulting four covers and completing it 
arbitrarily. 

The crucial observation is that on subproblems TIll and II ls we have 
kuL — v^i so Procedure Few.Sets yields approximation ratio ^Jn. Likewise, 
on subproblem II sl we have kwi, ^ so the dual Procedure Few_Sets_2 
again yields approximation ratio y/n. Finally, on subproblem IIss we have 
rh{ILss) < \/n, so Procedure Small_Sets_2 will yield approximation ratio y/n. 

Theorem 2. There is a randomized algorithm yielding an approximation with 
ratio ly/n with probability at least 1 — 1/n for the Nonuniform Max-Rep problem. 

I 

2.3 The Weighted Problem 

Finally, let us consider the weighted variant of the problem, in which every super- 
edge {Ai, Bj) has a nonnegative real weight uj{Ai, Bj) associated with it, and the 
goal is to maximize the weight of the selected cover. In the full paper we show 
that Procedures Few.Sets, Few_Sets_2, Small.Sets and Small_Sets_2 can 
be extended to the weighted setting with no change in the approximation ratio. 
This is done by defining appropriate generalizations of the deg and sdeg functions 
which take the super-edge weights into account. In particular, for a set E' of 
super-edges, let u}{E') = sdeg(u, "H) denote the super- 

degree of u in H, namely, the total weight uj{{{A[,Bi) \ Bi G E{u,'H)}). The 
super-degree of a vertex now represents the total weight of the super-edges it can 
potentially cover. Denote the weight of the set E{C) by /(C) = lo{E{C)). Finally, 
for every w G W , let B{w) denote the set to which w belongs. For u G Ai, 
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denote the set of neighbors of u in G by r{u,G). Then define deg(u, bb') = 
^weW'nr(u,G) B{w)). 

With these definitions, the entire analysis goes through with little change. 
We thus get the following. 

Theorem 3. There is a randomized algorithm yielding an approximation with 
ratio Aiyjn with probability at least 1 — lfn for the Nonuniform Weighted Max-Rep 
problem. | 



3 An Approximation Algorithm 

for the Red-Blue Set Cover Problem 

Let us first give some definitions. For every red element ri G R and set collection 
S, let deg(ri,5) denote the number of sets in S that contain n. Let A{S) = 
max{deg(ri,5) | ri G R)}. 

For a set Si, a set collection S and a subset R' C R of red elements, denote 
the set obtained by discarding the elements of R' from Si by ^{Si, R') = Si\ R', 
and let 

^S,R') = MSi,R') I SiGS'}. 

For every set Si G S, let r(Si) = |i?({S'i})|, and for every subcolection S' C S, 
let r(5') = \R{S')\. 

Let S* denote the optimal solution for the Red-Blue Set Cover problem on 
the instance S. 

3.1 The Greedy Procedure 

We make use of the following approximation procedure for the Red-Blue Set 
Cover problem. 

Procedure Greedy _RB. 

1. Modify S into an instance T of the weighted set cover problem as follows. 

(a) Take r = <?(5,R), 

(b) Assign each set Tj = <P{Si,R) in T a weight uj{Ti) = r{Si). 

2. Apply the greedy algorithm for weighted set cover to T, and generate a cover 

r. 

3. Take the corresponding collection S = {Si | G T} as the resulting ap- 
proximation. 

For every subcolection T' of an instance 'T of the weighted set cover problem, 
let uj{T') = eT' It is easy to verify the following. 

Lemma 5. For any set collection S' and corresponding instance T' = 'T{S' , R) 
of the weighted set cover problem, 



r{S') < u{T') < A{S) ■ r{S') . 
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Proof. Note that 

u;{r) = Y. = E = E deg(r„5') . 

TiGT' SiGS' rjGfl(S') 

As 1 < deg(rj,5') < A(5') < A(5) for every rj G R{S'), we get 
?-(5') < Y deg(rj,5') < A(5) • r(5') , 

r^Gfl(S') 

implying the claim. | 

Lemma 6. Procedure Greedy _RB has an approximation ratio of A{S) - log (3. 

Proof. Denote the minimum- weight set cover for T by and let T* = <P{S*, R) 
be the instance of the weighted set cover problem corresponding to S*. (Note that 
and T* need not necessarily be the same.) It is known that the greedy algo- 
rithm yields a log [3 approximation for the weighted set cover problem, namely, 
w(T) < log (3 ■ fGhv79j . Therefore, by Lemma |Sl 

r(5) < uj{f) < \og!3-u;{T*) . 

The optimality of implies that uj{fr^) < uj{T*). Combined, we get that 

r(5) < log/3-a;(r*) . 

Applying Lemma 0 again we get that 

r(5) < log/3 -Zi(5)-r(5*). | 



3.2 The Main Procedure 

For an integer parameter X, we consider the following procedure. 

Procedure Low_Deg(X). 

1. Discard from S the sets with more than X red elements, setting 

^ {5, G 5 I r(5,) < X}. 

2. If B{Sx) yf B then return S. /* Sx is not feasible */ 

3. Set Y = y/n/logP 

4. Separate the red elements into high and low degree elements, setting 
Rh { n G R\deg{ri,Sx) >Y} and Rl^R\Rh- 

5. Discard the elements of Rh from Sx, setting Sx,y g- <1>{Sx, Rh)- 

6. Apply Procedure Greedy _RB to Sx,y, and obtain a solution Sx,y- 

7. Complete the sets of Sx,y into the corresponding sets of Sx (by adding to 
each set Ti G Sx,y originally obtained from Si G Sx the discarded elements 
SiORh), 

and return the resulting solution Sx. 



Approximation of Label-CoveiMAx 229 

Lemma 7. \Rh\ < V n log (3 ■ X. 

Proof. Each set Si G Sx has at most X red elements. Hence 

\Rh\-Y < ^ deg{rj,Sx) < ^ deg{rj,Sx) = ^ r(S^) < \Sx\-X < nX , 

'^j^R Si^Sx 

so \Rh\ < nXjY , implying the lemma. | 



3.3 The Approximation Algorithm 

Now let us set X = max{r(5'*) | S* G 5*}, and consider the performance of 
Procedure Low.Deg when invoked with the parameter X = X. 

Lemma 8. Procedure Low_Deg(A) yields an approximation ratio of at most 
2^/nlogf3. 

Proof. First observe, that is necessarily feasible. Hence the procedure will 
always return a solution in its Step0 (and not Step ED. 

Let S* be some optimal solution for the problem, and let = |i?(5*) ni?/r| 
and = |i?(5*) fl Rl\. Since Z\(5^ y) < T, Lemma El guarantees that the 
solution produced by Procedure Low_Deg(A) uses at most Y ■ log/3 • rf = 
y/n\og (3 ■ rf red elements of Rf . By Lemma 0 the number of red elements of 
contained in the solution generated by the procedure is at most \/nlog/3-X. 
Combined, the total number of red elements used by the procedure satisfies 
’’(‘5^) < log/3 • rf + ^/nYo^ ■ X. But by the definition of X, necessarily 
r{S*) > X, and hence < 2-^71 log /3 • r{S*), yielding the lemma. | 

As X is not known to us in advance, it will be necessary to search for it. This 
yields our final algorithm. 



Algorithm Low_Deg2. 

1. For A = 1 to p do: 

Invoke Procedure Low_Deg(A). 

2. Take the best of the obtained solutions. 



Theorem 4. Algorithm Low_Deg2 yields an approximation ratio o/2\/n log/3 
for the Red-Blue Set Cover problem. | 

A minor variant of this algorithm yields an approximation ratio of 
However, the problem clearly admits also a trivial approximation algorithm of 
ratio p, and is always dominated by the smaller of p and ^/n, so this 

variant is not as interesting (assuming the factor of log (3 is negligible compared 
to the other terms). 
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3.4 The Weighted Case 

Finally, let us consider the weighted variant of the problem, in which every red 
element Vi G R has a positive real weight uj{ri) associated with it, and the goal is 
to minimize the weight of the selected cover. In the full paper we show that Pro- 
cedures Greedy _RB and Low_Deg can be extended to the weighted setting 
with no change in the approximation ratio. In particular, in addition to the previ- 
ous definitions, define the weight of a set Si to be uj{Si) = 
a subcollection S' let uj{S') = ^(g)- Procedure Greedy_RB, Step 

1(b) should assign each set Ti in T the weight uj{Ti) = ui{Si). The inequalities 
of Lemma El become 



< uj{T') < A{S)-oj{S') , 

with minimal changes in the proof, as well as in the proof of Lemma El In 
Procedure Low_Deg, the definition of Sx changes to Sx G- {Si G S \ co{Si) < 
X}. As a result, LemmaQnow asserts that uj{Rh) < \/nlog(3-X. The definition 
of X becomes X = max{w(S'*) I S* G 5*}. The proof of Lemma|Hluses = 
lu{R{S*) C\Rh) and uj’l = u;{R{S*) (IRl) instead of and rj, respectively. We 
thus get the following. 

Theorem 5. There is an algorithm with approximation ratio 2\/n log/3 for the 
Weighted Red-Blue Set Cover problem. | 
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Abstract. The generalized maximum linear arrangement prob- 
lem is to compute for a given vector x £ WC and an n x n non-negative 
symmetric matrix w = (iVij), a permutation tt of that maxi- 
mizes ^ \xj — Xi\. We present a fast ^-approximation algorithm 

for the problem. We also introduce a | -approximation algorithm for MAX 
fe-CUT WITH GIVEN SIZES. This matches the bound obtained by Ageev 
and Sviridenko, but without using linear programming. 



1 Introduction 

We define the Generalized Linear Arrangement Problem as the problem 
of computing for a given vector x = (xi < • • • < a;„) £ iR" of ‘points’ and an 
n X n non-negative symmetric matrix w = {wij) of ‘weights’, a permutation tt 
of {1, . . . , n} so that j \xj — Xi\ is optimized. In an illustrative example, 
consider n linearly ordered points in which a set of n machines is to be located, 
and Wij is a measure of association of the i-th and j-th machines. Our interest 
is in the maximization version, the generalized maximum linear arrange- 
ment PROBLEM (GMLAP), where the goal is to maximize j WTi-i.TTj — Xi\, 
and keep the machines far from each other (compare with 0). 

The special (NP-hard) case in which Xi = i is known as the linear ar- 
rangement PROBLEM. Another special case of the problem is MAX CUT prob- 
lem WITH GIVEN SIZES OF SIDES where for some p < Xi = ■ ■ ■ = Xp = 0 
and Xp+i = ■■■ = Xn = I- Ageev and Sviridenko |2] applied a novel method 
of rounding linear programming relaxations and developed a ^-approximation 
algorithm for this problem. (|2| contains a ^-approximation for the more general 
directed version of the problem.) They also obtained a similar result for a more 
general max fc-CUT problem in which integers pi , ...,pk are given and the goal 
is to compute a fc-cut, that is, a partition Si,...,Sk of {!,..., n} with IS^I = Pi 
i = 1, ..., A:, which maximizes the weight of edges whose ends are in different sides 
of the partition. 

The GMLAP is a special case of the maximum quadratic assignment 
PROBLEM. In this problem two n x n nonnegative symmetric matrices A = 
(ttij) and B = (bij) are given and the objective is to compute a permutation 
TT of {!,..., n} so that maximized. A ^-approximation 

algorithm for this problem, under the assumption that the values of one of the 
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matrices satisfy the triangle inequality, is given in Of course, this bound 
applies also to the GMLAP. 

We will present ^-approximation algorithms for the GMLAP. An interest- 
ing feature of our algorithm is that it simultaneously approximates the max cut 
problems with sizes p and n — p for all possible values of p. We also present an 
alternative ^-approximation for MAX k-CVT problem with given sizes of 
THE SIDES. Unlike the algorithm of Ageev and Sviridenko, the latter algorithm 
doesn’t use linear programming. The full version of this paper also contains a 
randomized ^-approximation for the MAXIMUM linear arrangement prob- 
lem. 

We first describe, in SectionEl a generic randomized approximation algorithm 
for MAX CUT WITH GIVEN SIZES OF SIDES. The analysis of this special case will 
be used in SectionO where we obtain our main result on the GMLAP. In Section 
0we treat the MAX fc-cuT problem with given sizes of the sides. 

For a partition (S', T) we mean by (z, j) S (S, T) that i € S and j G T. We 
denote by opt the optimal solution value in the problem under consideration. 

2 Max Cut with Given Sizes of Sides 

Given an undirected graph G = (V,E) with edge weights Wij (i,j) G E and 
\V\ = n, a cut is a partition (S, T) of V and its weight is X^sgS The 

problem is to compute a maximum weight cut such that |S| = p. Without loss 
of generality, we assume that p < f . 



Max_Cut 

input 

1. A graph G = (V,E) V = {l,...,n} with edge weights Wij (i,j) ^ E. 

2. An integer p < 

returns 

A partition S,T ofV = {l,...,n} such that |S| = p. 

begin 

for i = 1, ..., n 

Wi ■.= Wij- 

end for 

P ~ {A, ..., Ap eV\Wi>WjViGPj^ P}. 

Randomly choose p nodes from P to form S. 

T~V\S. 
return S, T. 
end Max-Cut 



Fig. 1. Algorithm Max -Cut 



Theorem 1 Let w{S, T) be the expected weight of the partition returned by Al- 
gorithm Max-Cut (Figure^. Then, w{S,T) > 
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Proof: The probability that any given edge {ijJ) ij G P to be separated 
by (S,T) is | for both cases I G P and I ^ P. Consider an optimal solu- 
tion and denote by OPT the set of size p in it. Let r = \OPT fl P|, s = 

'l2{i,j)e{OPTnP,p\OPT) ^ = 'l2{i,j)e{OPTnP,v\P) Note that by defi- 

nition of P, there is some threshold k such that Wi is at least k for i G P and at 
most k for i G OPT \ P. Also note that edges with two ends in P are counted 
twice in 



/c ^ ® + , t s+{2p-r)k t 

w{S,T)> 



and 



opt < (p — r)k + t + s. 

We note that to compute a minimum possible value for the ratio given 

that it can be made smaller than f, we can assume w.l.o.g. that t = 0. Let 
s = a{2p — r)k. We now distinguish two cases. 

Suppose first that a < 1. We use 

2p — r 



and 
so that 



w{S, T) > k{l + a)- 



opt < k{{2p — r)(l + a) — p), 



w{S,T)^l {2p-r){l + a) 



opt A {2p — r){l + a) — p' 

This ratio is monotone decreasing in a so that for the worst case we substitute 
0=1 and obtain 



w{S,T) 



> 



2p — r 



opt 2{3p — 2r) 

This expression is maximized when r = 0, in which case we obtain the ratio |. 
Suppose now that a > 1. We use the inequalities 



and 
so that 



w{S,T)>-, 

. . 2p — r s 

opt < [p — r)k + s < — - — k -I- s < — 
2 2a 

MS,T) ^ 1 

opt ~ 3 



3 

s<2^ 



Algorithm MaxJJut is fast and simple, but for our results in the next section 
we will use a variation of it, which is also easier for derandomization. In this 
variation we change the main step of the algorithm as described in Figure 0 

Theorem 2 Let w{S,T) be the expeeted weight of the partition returned by 
Max-Cut with the modifieation given in Figure^ Then, w{S,T) > 

Proof: As in Theorem [D with P={l,...,2p}. 
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begin 

Sort V in non-increasing order of Wi . 

{For simplicity, suppose that Wi > ■ • • > Wn-) 
for i = 1, ...,p 

Assign 2i — 1 to S, with probability Assign 2i to S otherwise. 

end for 



Fig. 2. Modified MaxJJut 



3 Generalized Maximum Linear Arrangement 

We start by presenting an alternative way to compute the weight of a solution 
7T to GMLAP (cf. |H|): For p = 1, ...,n - 1 let Cp = YTj=p+i No*® 

that the problem of maximizing Cp over all permutations tt of {1, ...,n} is the 
max cut problem with sizes of sides p and n — p. Now we observe that 

n— 1 

^ ^ jTTj ^i\ — ^ ^ Gp |Xp-|_i Xp I . (1) 

ij P=1 

In other words, the contribution of the interval [xp,Xp.^.l] to the weight of the 
solution is Cpl^p+i — Xp\. Our algorithm Randomized-GM LA (see Figure OJ 
approximates simultaneously all of these cut problems with factor ^ each, and 
consequently the same bound applies to the GMLAP instance as well. 



Randomized-GMLA 

input 

1. A non-negative symmetric matrix W = (wij i,j = 

2. A set of points xi, ...,Xn £ IR. 

returns A permutation ni, of V = {!,..., n}. 

begin 

for i = 1, ..., n 

Wi--= Ejev\{i) 

end for 

Sort V in non-increasing order of Wi . 

{For simplicity, suppose that Wi > • ■ • > Wn.) 
for i = 1, ..., [fj 

Set TVi := 2i — 1 and iVn-i+i := 2i with probability | . 

Set Hi ;= 2i and iin-i+i ~2i — 1 otherwise. 

If n is odd, set tt ^+i := n. 

2 

end for 
return tt. 

end Randomized^GMLA 



Fig. 3. Algorithm RandomizedJGM LA 



Approximation Algorithms for Maximum Linear Arrangement 235 



Theorem 3 Let tt be the permutation returned by Randomized-GM LA. 

1. Let Sp = {tti, ...,TTp}, Tp = {7Tp+i, ...,7T„}. Then {Sp,Tp) is a ^-approximation 
for the max cut problem with sizes of sides p and n — p. 

2. TT is a ^ -approximation for the GMLAP. 

Proof: By TheoremEI for p = 1, ...,n — 1, the value of Cp in the output of the 
algorithm is a ^-approximation for the respective max cut problem. The proof 
for the second part of the theorem follows now from Equation O ■ 

Derandomizing the algorithm is particularly simple. We apply the ‘method 
of conditional expectations’. Consider the i-th iteration of the algorithm. We 
should set to either 2i — 1 or to 2i, and 7r„_i+i to the other value. This is 
done so that the expected value of the solution is maximized given the previous 
assignments and assuming that the following ones will be done according to 
Randomized-GM LA. We call the resulting algorithm GMLA. 

Theorem 4 Let m = |{(*, j) : Wij > 0}|. Then, Algorithm GMLA computes a 
^-approximation for the GMLAP and for MAX CUT with given sizes of the 
SIDES for every p = 1, ..., in time 0{m -\- nlogn). 

4 Max fc-Cut with Given Sizes of the Sides 

Given a graph G = (V,E) with edge weights w and integers pi, ...,pk such that 
J2pi = n, the MAX /c-CUT with given sizes of the sides is to compute a 
A:-cut, that is, a partition Si,...,Sk of V such that 15^1 = pi i = l,...,k, which 
maximizes the weight of edges whose ends are in different parts of the partition. 

A vertex v £ V is said to cover the weight of the edges {(w,u) G E}. A 
subset V C V covers the weight of the union of edges which have at least one 
end in it. Bar- Yehuda 0 developed an O(n^) ^-approximation algorithm for the 
following problem: Given w, compute a vertex set of minimum size that covers 
edge weight of size at least w. 

One can obtain from this result, in a straightforward way, a solution to the 
following problem: Given p < |n find a set S' of 2p vertices that covers edge 
weight of at least w{p), where w{p) is the maximum edge weight that can be 
covered by p vertices. To achieve this goal we apply binary search over [0, w{E)], 
where w{E) is the total weight of E. For each test value, w, we apply Bar- 
Yehuda’s algorithm and we stop with the highest value for which the algorithm 
returns a set S' with at most 2p vertices. The complexity of this procedure is 
0(n^ logw{E)). 

Our algorithm for the case of A: = 2 (max cut with given sizes of the 
sides) proceeds as follows: Randomly select p vertices from S' and move them 
to the other side of the cut. Let the resulting set be Sa- We claim that the 
expected size of the cut {Sa, V \ Sa) is a ^-approximation for the problem. The 
argument is that the weight of the edges covered by S' is an upper bound on the 
optimal solution value, and each of these edges will be in the cut with probability 
i. The algorithm can be derandomized by applying the ‘method of conditional 
expectations’. Alternatively, the rounding method of Ageev and Sviridenko can 
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also be used to obtain a ^-approximation in deterministic linear time, once S' 
is given p. 

Our algorithm can be modified for the MAX k-CVT problem with given 
SIZES OF THE SIDES as well: We first observe that if pi < for every i = I, k 
then the cut contains more than half of the edges so that a random solution has 
expected weight of at least half the total weight of the graph. Thus a random 
solution suffices to obtain a ^-approximation. 

Assume now that pi > ^. We compute as above sets S' and Sa with p = 
n — p\. We set P\ = V\Sa and arbitrarily partition Sa to form P2, Pk- The 
resulting fc-cut has the property that the expected weight of edges between Pi 
and the other parts is already half the weight of the edges covered by S' which is 
itself an upper bound on the optimal solution. Thus the fc-cut we constructed is 
a ^-approximation for the problem. Again, the algorithm can be derandomized. 
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Abstract. We consider the problem of partitioning the nodes of a com- 
plete edge weighted graph into k clusters so as to minimize the sum 
of the diameters of the clusters. Since the problem is NP-complete, our 
focus is on the development of good approximation algorithms. When 
edge weights satisfy the triangle inequality, we present the first approx- 
imation algorithm for the problem. The approximation algorithm yields 
a solution that has no more than lOfc clusters such that the total diame- 
ter of these clusters is within a factor 0(log (n/k)) of the optimal value 
for k clusters, where n is the number of nodes in the complete graph. 

For any fixed fc, we present an approximation algorithm that produces k 
clusters whose total diameter is at most twice the optimal value. When 
the distances are not required to satisfy the triangle inequality, we show 
that, unless P = NP, for any p > 1, there is no polynomial time approx- 
imation algorithm that can provide a performance guarantee of p even 
when the number of clusters is fixed at 3. Other results obtained include 
a polynomial time algorithm for the problem when the underlying graph 
is a tree with edge weights. 
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1 Introduction 



1.1 Motivation 



The main goal of clustering is to partition a set of objects into homogeneous 
and well separated subsets (clusters) . Clustering techniques have been used in a 
wide variety of application areas including information retrieval, image process- 
ing, pattern recognition and database systems [IHahTjZH.I jt)ti|.l I J88|1JH73] . Over 
the last three decades, several clustering methods have been developed for spe- 
cific applications [H.lt)7f.l I Many of these methods define a distance (or a 
similarity measure) between each pair of objects, and partition the collection 
into clusters so as to optimize a suitable objective based on the distances. Some 
of the objectives that have been studied in the literature include minimizing the 
maximum diameter or radius, total pairwise distances in clusters, etc. The sur- 
vey paper by Hansen and Jaumard fH,T 97j provides an extensive list of clustering 
objectives and applications for these objectives. 

Clustering problems where the objective is to minimize the maximum clus- 
ter diameter have been well studied from an algorithmic point of view (see Sec- 
tion O for a summary) . The focus of this paper is on clustering problems where 
the objective is to partition a given collection of objects into a specified number 
of clusters so as to minimize the sum of the diameters of individual clusters. The 
motivation for this objective is derived from the fact that in several applications, 
clustering algorithms that minimize the maximum diameter produce a “dissec- 
tion effect” |HJ9VIMS89| . This effect causes objects that should normally belong 
to the same cluster to be assigned to different clusters, as otherwise the diam- 
eter of a cluster becomes too large. In such applications, the sum of diameters 
objective is more useful as it reduces the dissection effect IH.If)7IMMs9l . 



1.2 Problem Formulation and Previous Work 

To study the clustering problem in a general setting, we represent the objects 
to be clustered as nodes of a complete edge-weighted undirected graph G{V, E) 
with \V\ = n. The distance (or similarity measure) between any pair of objects 
can then be represented as the weight of the corresponding edge in E. For an 
edge {u, u} in E, we use uj{u,v) to denote the weight of the edge. It is assumed 
that the edge weights are nonnegative. For any subset V' of V , the diameter of 
V (denoted by DIA(F')) is the weight of a largest edge in the complete subgraph 
of G induced on V' . Note that when \V'\ = 1, DIA(F') = 0. A formal statement 
of the clustering problem considered in this paper is as follows. 

Clustering to Minimize Sum of Diameters (Cmsd) 

Instance: A complete graph G{V,E), a nonnegative weight (or distance) oj{u,v) 
for each edge {n, u} in E and an integer k < \V\. 

Requirement: Partition V into k subsets Vi, V 2 , . . ., 14 such that DIA(Vi) 
is minimized. 





Approximation Algorithms for Clustering 239 



In general, edge weights in instances of Cmsd need not satisfy the triangle 
inequality. We use Cmsd^ to denote instances of Cmsd where edge weights 
satisfy the triangle inequality. Most of our results are for the CMSD/i problem. 
We assume without loss of generality that the optimal solution value to any 
given instance of CmsD/i is strictly greater than zero. We may do so since it is 
easy to determine whether a given instance of CMSD/i can be partitioned into a 
specified number of clusters each of which has a diameter of zero. 

We now summarize the known results from the algorithmic literature for the 
Cmsd problem. Brucker IBr78l showed that Cmsd (without triangle inequality) 
is NP-complete for any fixed fc > 3. Hansen and Jaumard studied the 

Cmsd problem with k = 2 and presented an algorithm with a running time 
of 0{n^ log n). They also showed that for fc = 2, the minimization problem for 
any given function of the two diameters can be solved in O(n^) time. When 
the input is specified as an undirected edge weighted graph with n nodes and 
m edges, Monma and Suri |MS89j showed that the Cmsd problem for k = 2 
can be solved in time 0(nm log n). This is an improvement over the algorithm of 
for sparse graphs. Brucker [bTt^ observed that the 1-dimensional version 
of Cmsd _4 can be solved efficiently for any value of k. For the Euclidean version 
of CMSD/i with k = 2, Monma and Suri |MS89j presented an algorithm which 
uses 0{n) space and runs in O(n^) time. Capoyleas et al. ICRW91I also studied 
a generalized version of the Cmsd^ problem for points in 51?^. They showed that 
for any fixed fc, the problem can be solved in polynomial time for any monotonic 
increasing function of cluster radius or diameter. Examples of such monotonic 
increasing functions include sum of diameters (or radii), maximum diameter (or 
radius), etc. 



1.3 Summary of Main Results 

We study the complexity and approximability of the Cmsd problem. The main 
results of this paper can be summarized as follows: 

1. We show that unless P = NP, Cmsd cannot be efficiently approximated to 
within any factor even when the number of clusters is fixed at 3. (In contrast, 
note that Cmsd is known to be efficiently solvable when the number of 
clusters is equal to 2 [H.TSyiMSSQj .l 

2. For Cmsd^, we show that if the constraint on the number of clusters must 
be met, then it is NP-hard to approximate the total diameter to within a 
factor 2 — e, for any e > 0. 

3. In contrast to the non-approximability results above, we present a polyno- 
mial time bicriteria approximation algorithm |MR-|-98i for CmsD/^. This ap- 
proximation algorithm outputs a solution with at most lOfc clusters whose 
total diameter is within a factor of 0(log(n/fc)) of the minimum possible 
total diameter with fc clusters. 

4. We also show that when the number of clusters fc is fixed, there is an approx- 
imation algorithm for Cmsd^ which produces at most fc clusters whose total 
diameter is within a factor of 2 of the minimum possible total diameter. 

A brief summary of our other results is given in Section 0 
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1.4 Other Related Work 

A number of researchers have addressed the clustering problem where the goal 
is to minimize the maximum diameter or radius of a cluster. In the location 
theory literature, the problem of minimizing the maximum radius is also known 
as the fc-center problem. For the metric version of the problem of minimizing 
the maximum diameter, Gonzalez !c^ presented a simple greedy heuristic 
that runs in 0{nk) time and provides a performance guarantee of 2. He also 
showed that, unless P = NP, the performance guarantee cannot be improved. 
Using a general technique for approximating bottleneck problems, Hochbaum 
and Shmoys [HS8fi| also presented a heuristic with a performance guarantee of 
2 for the metric version of the fc-center problem. 

In |FPT81IMS84j . it is shown that the problems of minimizing the maximum 
radius or diameter remain NP-hard even for points in For this geometric ver- 
sion, Feder and Greene [FTT^ improved the running time of Gonzalez’s heuris- 
tic to 0(n log n). They also showed that it is NP-hard to achieve a performance 
guarantee of 1.82 and 1.97 respectively for the diameter and radius problems in 

Recently, Agarwal and Procopiuc |AP98j have presented an exact algorithm 
with a running time of ^ for the fc-center problem for points in 5R'*. For 

any e > 0, they have also presented an (1 -|- e) approximation algorithm with a 
running time of 0(n log fc) -|- ^ for the problem. 

Plesnik fPM) has addressed the problem of partitioning the edges of a given 
graph G(y, E) into k subsets so that each subset forms a connected graph on the 
vertex set V , and a given function of the diameters of the resulting subgraphs is 
minimized. The objectives considered in include minimizing the maximum 

diameter and minimizing the total diameter. It is shown that, unless P = NP, 
even for fc = 2, these objectives cannot be efficiently approximated to within 
factors less than 3/2 and 5/4 respectively. 

Several other types of clustering problems have also been studied in the 
literature. For example, Gharikar et al. IGG+9VI study an incremental version 
of the clustering problem for minimizing the maximum radius. Pferschy et al. 
jPP W 94j study geometric versions of clustering problems using objectives such 
as minimizing the total perimeter. Agarwal and Procopiuc study pro- 

jective clustering problems where the goal is to cover a set of points in 3?"^ 
using hyper-strips, and the objective is to minimize the maximum width of the 
strips. References where other types of clustering problems are studied include 




2 Preliminaries 

2.1 A Simple Upper Bound on the Optimal Solution Value 

Given any instance of Cmsd, we can easily construct a feasible solution consisting 
of k clusters with total diameter at most the maximum edge weight: form one 
cluster consisting of n — fc -I- 1 arbitrarily chosen vertices and make each of the 
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remaining k — 1 vertices a singleton cluster. This observation is stated formally 
below. 

Remark 1. For any instance of I of Cmsd, the optimal solution value is at most 
the maximum edge weight in /. □ 

2.2 A Merging Lemma 

The formulation of the Cmsd problem requires that the clusters be pairwise dis- 
joint. Our approximation algorithms may produce clusters which may not satisfy 
the disjointness condition. The following lemma points out that for instances of 
CMSD/i, we can merge pairs of intersecting sets without increasing the total 
diameter. 

Lemma 1. Let I be an instance o/Cmsd^ given by the edge weighted complete 
graph G{V, E) and integer k. Let C = {Ci, C2, ■ ■ ■ , Ck} be a collection of subsets 
of V such that their union is V and the sum of the diameters of all the subsets 
in C is if. Further, suppose Ci and Cj (i ^ j) are two sets in C such that 
Ci n Cj yf 0. Then the total diameter of the collection C obtained by deleting Ci 
and Cj from C and adding the set Ci U Cj is at most if. 

Proof. The lemma would follow by showing that DIA(Ci U Cj) < DIA(Ci) -I- 
DIA(Cy). To do this, let x be a node in Ci 0 Cj and let u and v be two nodes in 
CiUCj such that oj{u, v) = DIA(CiUCy). If u and v are both in Ci (or both in Cj), 
then co{u, v) < DIA(Ci) {uj{u, v) < DIA(Cj)), and the proof is trivial. So, assume 
that u G Ci and v G Cj. By the triangle inequality, oj{u,v) < lo{u,x) +uj{v,x). 
Since u and x are both in Ci, uj{u,x) < DIA(Ci). Similarly, oj{v,x) < DIA(Cy). 
Therefore, DIA(Ci U Cj) = lo{u,v) < DIA(Ci) -I- DIA(Cj), and this completes 
the proof. □ 

In view of the above lemma, when considering instances of CMSD/i, we may 
repeatedly merge pairs of clusters with nonempty intersection until the clusters 
are pairwise disjoint. The merging process does not increase the total diameter 
of the clusters. 

2.3 Transformation to Weighted Set Cover 

Our results rely on a transformation from instances of Cmsd/\ to instances 
of the weighted set cover problem. Given an instance of Cmsd^ along with a 
nonnegative value /, the transformation in Figure E produces an instance of the 
weighted set cover problem. It is clear that the transformation can be carried 
out in polynomial time. The following lemma points out an important property 
of the resulting set cover instance. 

Lemma 2. Let I denote an instance of Cmsd^ problem and let f be a non- 
negative number. Let I' denote the instance of the weighted set cover problem 
produced by the transformation in Figure Q when I and f are given as inputs. 
Let OPT(J) and OPT(/') denote the optimum solution values to I and F re- 
spectively. Then, OPT(I') < 2 0PT(/) -f /. 
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TransformToSetCover(G(V, E), k, /) 

/ is a nonnegative parameter. 

Output: An instance of weighted set cover problem with base set Q, and collection 
W of nonempty subsets of Q, each with a weight. The weight of a set W G W is 
denoted by c(W). 

1. Q = V /*Note: \Q\ =n. *j 

2. W = 0 

3. for each v £V do 

(a) Sort {w(w, u) : u £ V} into (strictly) increasing order. 

(b) Let ai = 0 < 02 < . • • < denote the sorted order. 

(c) for i = 1 to r„ do 

i. Let = {u : ui{u,v) < at} 

ii. Let c{Wi) = DIA(W;) + f/k 
hi. Add (Wi) to W 

4. return(Q,W) 



Fig. 1. Transformation from Cmsd/i to Weighted Set Cover 



Proof. Let Ci, C 2 , ■■ ■, Ck denote the clusters in an optimal solution to I. Thus, 
OPT(/) = DIA(Ci). We will show that there is a subcollection of k sets 
in /' such that the sets in the subcollection together cover the base set Q and 
the total weight of the sets in the subcollection is at most 2 0PT(I) + /. The 
lemma would then follow immediately. 

Consider each cluster Ci {1 < i < k) in the optimal solution to /. If Ci 
contains two or more nodes, let Vi be a node in Ci such that Vi is one of the 
endpoints of an edge whose weight is equal to DIA(C'i). If Ci contains only one 
node (i.e., DIA(C'i) = 0), let Vi be that node. Now, by the transformation of 
Figure P /' has a set, say Wi, that includes all the nodes which are at a distance 
of at most DIA(C'i) from Vi. By the triangle inequality, DIA(Wi) < 2DIA(Ci). 
So, c{Wi) = DIA(Wi) + f/k < 2DIA(C'i) + f/k. Clearly, the subcollection 
{Wi,W 2 , . . . , Wk} covers the base set Q. The weight of this cover is c(ITi), 
which is at most X]^=i(2DIA(Ci) + f/k) = 2 0PT(I) + /. This completes the 
proof of the lemma. □ 



2.4 The Budgeted Maximum Coverage Problem 

For obtaining our approximation result for CmsD/^ (where the number of clusters 
A: is a part of the problem instance), we use a known approximation result for the 
Budgeted Maximum Coverage Problem (Bmcp). Below, we provide a definition 
of the problem and state the necessary approximation result. 

An instance of Bmcp consists of a base set Q = {q\,q 2 ,. . . , qn}, a collection 
W of nonempty subsets of Q, a nonnegative weight c{W) for each set W £ W 
and a nonnegative budget B. The goal is to choose a subcollection of sets from W 
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SO that the total cost of the chosen sets is at most B and the number of elements 
covered by the chosen sets is maximum. This problem is NP-hard since it is a 
restatement of the minimum cost set cover problem. The following approximation 
result for Bmcp is proved in |K M Nlffl| . 

Theorem 1. Bmcp can be ejficiently approximated to within a factor (1 — 1/e). 

□ 



It is shown in !KMNih>j that the approximation algorithm referred to in 
TheoremQcan also be used for the more general version of Bmcp where there is 
a weight associated with each element of the base set, and the goal is to maximize 
the weight of the elements covered by the chosen sets. For our results, the unit 
weight version of Bmcp where the weight of each element of the base set is 1, 
suffices. 



3 Approximating Cmsd^ 

3.1 Algorithm Overview 

We give a brief top-down description of our approximation algorithm Approx- 
CMSD/i, and introduce the terminology used in the analysis. At all times, 
AppROX-CMSD/i maintains a set T> of clusters which cover all vertices in V, at 
cost >F. We call these global clusters, since they cover all vertices in V. The algo- 
rithm begins with T> consisting of \V\ singleton clusters, and progresses through 
a series of rounds. During each round, it constructs a vertex set N by selecting 
an arbitrary vertex from each of its current clusters. It then finds a clustering 
C on N. We call the clusters in C local clusters, since they do not need to cover 
all of V, but only N. As will be shown, the number of clusters \C\ is at most 
3fc[l -I- In (I A|/fc)]. We use to denote their total cost. Next, AppROX-CMSD^i 
uses Merge to suitably combine the clusters in T> into a set of just \C\ clusters, 
which cover all of V at cost at most dr + ijj. This entire process is repeated until 
the number of clusters in I) is at most lOfc. 

AppROX-CMSD/i uses FindCover to return the required C clusters during 
each round. FindCover, in turn, iterates through at most 0(ln(|A|/fc)) calls 
to ParametricBmcp each of which returns a set of at most 3k clusters which 
cover all but a 1/e fraction of the remaining uncovered vertices from N. These 
clusters have cost no more than 3(1-1- e)OPT. 

Using TransformToSetCover from Figure El ParametricBmcp con- 
verts the problem to a set cover instance, and repeatedly calls the Budgeted 
Maximum Coverage Approximation Algorithm Bmcp, with growing budgets, 
until the budget is large enough to make Bmcp cover the required fraction of 
vertices. A complete description of the approximation algorithm is given in Fig- 
ure El 
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AppROX-CMSDA(G(y, E), k) 

Output: A set of no more than lOfc clnsters with sum of diameters 
0(ln(|T/|/fc))0PT. 

1 . V={{v} : v£V} 

2. while {\V\ > lOfc) do /* We call each iteration a Round * / 

(a) N = Set of vertices obtained by choosing one arbitrary vertex from each 
D gV. 

(b) C =FindCover(G(AT, E'), k) /* G{N, E') : Complete subgraph on N * / 

(c) D =MERGE(r>,C) 

3. return(I>) 



FlNDCOVER(G(Af,£;),fc) 

Output: A set of no more than 3fc[l + ln (|A^|/fe)] clnsters which cover N with cost 
no more than 3[1 + In (|Ai|/fc)](l + e)OPT. 

1. C = 0 

2. while {N 7 ^ 0) do 

(a) C' = PARAMETRICBMCP(G(Af, F), fc) 

(b) C = C U C' 

(c) N — N — {i : i G G for some G G C'} 

(d) E = Edges in the complete subgraph induced on the new, smaller N 

3. return(C) 

ParametricBmcp(G(A^, E), k) 

Output: A set of no more than 2>k clusters which cover (1 — l/e)|Al| or more vertices 
from N with cost no more than 3(1 + e)OPT for any fixed e > 0. 

1. / = the smallest non-zero edge weight in G{N, E) 

2 . C' = {{w} ■. vGN} 

3. while (|C'| > 3fc or |{u : v G G for some G G C'}\ < (1 — l/e)|A'|) do 

(a) S =TransformToSetCover(G(A^, S), fc, /) 

(b) C' = Bmcp(5,3/) 

(c) / = (! + £)/ 

4. return(C') 

Merge(H,C) 

Remark: T>,C collections of vertex sets such that V-D G T>, 3G G C such that 
(DnG/ 0 ). 

Output: A set of \C\ vertex sets which cover all {w : v G X for some X G EuC} 
at cost no more than the sum of the costs of C and E. 

1. for each G G C do 

for each D G E do 

if (G n _D / 0) then 

G=GUD -E^E-D 

2 . return(C) 



Fig. 2. Outline of Approx- CMSD/ i 
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3.2 Correctness of Algorithm 

To show that our algorithm runs in polynomial time and achieves the stated 
performance guarantees, we analyze it from the lower level functions up to the 
top level call, beginning with ParametricBmcp, and finishing with Approx- 

CMSD/i. 

Lemma 3. Given graph G with optimal k-cluster cost OPT, ParametricBmcp 
returns no more than Sk clusters which contain at least (1 — l/e)|A| of the ver- 
tices from |iV|. Further, the sum of diameters of the returned clusters is no more 
than 3(1 + e)OPT. 

Proof. By Lemma 0 the call to TransformToSetCover returns a set cover 
instance with optimal solution no more than 2 OPT + /. When / > OPT, by 
Theorem^ the call to Bmcp with budget 3/ > 3 OPT > 2 OPT + /, will return 
sets which cover the stated number of vertices. Also, when / > 0, this solution 
cannot have more than 3fc clusters: each of the clusters has minimum cost //fc, 
so any more than 3fc clusters will have cost more than 3/. Therefore, with any 
/ > OPT, the call to Bmcp with budget 3/ will return at most 3k clusters which 
cover enough vertices. 

Since we start / at the smallest possible (non-zero) value (in fact, we first 
implicitly test if / = 0 suffices), and increase it by factors of (1 -I- e), we are 
guaranteed to try a value / such that / < (1 -|- e)OPT. This will occur within 
0(log]^_i_g OPT) iterations. Since OPT is at most the maximum edge weight (Re- 
mark Q, the number of iterations is polynomial. □ 

Lemma 4. Given graph G{N, E) with optimal k-cluster cost OPT, FindCover 
returns no more than 3fc[l-|-ln (\N\/k)] clusters which cover N with cost no more 
than 3[1 -I- In (|iV|/fc)](l -I- e)OPT. 

Proof. By Lemma 0 each call to ParametricBmcp will return at most 3k 
clusters of cost 3(1 -I- e)OPT, and will leave at most |A|/e of the |A| vertices 
uncovered. In the ensuing iterations of ParametricBmcp, we use a subset of 
N which certainly has an optimal fc-clustering with cost no greater than OPT. 
After i iterations, we are guaranteed to have no more than 3k remaining vertices, 
where \N\/e^ < 3k. To upper bound i, notice that if i is not the last iteration, 
|iV|/e*“^ > 3k, and i < l-|-ln (|iV|/3fc) < In (\N\/k). The final iteration generates 
at most 3k additional singleton clusters with cost zero. Each of the l-|-ln (\N\/k) 
iterations returns no more than 3k clusters, of cost at most 3(1 -1- e)OPT. The 
lemma follows. □ 

Lemma 5. Merge returns \C\ vertex sets which cover all {v : v G X for some 
cluster X G Du C}, with cost no more than the sum of the costs of C and T>. 

Proof. Consider all C U 2? clusters whose cost is the sum of the costs of C and T>. 
Since each D gT> intersects some C G C, we may replace D and C with Z? U C, 
at no additional cost, by Lemma ^ This process can be continued until each 
cluster in T> has been merged into some cluster in C. □ 
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Finally, we need to show that the top level function AppROX-CMSD/i does 
in fact halt within a polynomial number of iterations. To do this, we show that 
the number of clusters in T> is eventually less than 10k, and that this happens 
after no more than 0 (log 2 log 2 (n/k)) rounds. 

Our algorithm begins with n vertices, and by Lemma 0 after the end of the 
first round, we are left with 3fc[l + In (n/k)] clusters, each of which contributes 
one vertex towards the second round. Generalizing this for all rounds, let T>i be 
the set of global clusters at the end of round i, and rii = \T>i\. Then, no = n, and 
rii-i is both the number of clusters at the end of round i — 1 and the number of 
vertices we need to cluster in the round. We get the recurrence 

n-i+i < 3fc[l + \a.{ni/k)]. 



Let ti = rii/k, we have < 3 + 3 • In < 6 • In for ti > e. By having enough 
rounds to make L constant, we will have a total of 0{k) clusters. After 0(log* to) 
rounds, ti becomes a constant, but here we will instead give a simple proof that 
0 (log 2 log 2 to) = 0 (log 2 log 2 {n/k)) rounds are sufficient. 

Lemma 6. After at most 5 + log 2 log 2 {n/k) rounds, \T>\ contains at most 10k 
clusters. 

Proof. Consider the “iterating” function used to get log* x from log 2 x. For any 
function / such that f{x) < x for sufficiently large x, the iterating function 
is the number of times you must apply that function to get a constant. More 
specifically, define the function {f)*{x)c to be the number of times that /() 
must be iteratively applied to get a result less than C. (Thus, (log 2 )*(a;)i gives 
the familiar function log* x.) Next, we use the fact that for x > 2109, 6 • Inx < 
\/x. Thus, (6 • In )*(a::)2io9 ^ {^)*{^) 2109 < (^)*(2^)i- However, (^)*(a;)i — 
[log 2 log 2 x) , so we need to iterate less than log 2 log 2 to times before reaching 
ti < 2109. One more iteration for n gives us u-i+iogj logj ti < 3fc + 3/c • In 2109 < 
26fc. Applying the recursion four more times gives ns+iog^ logj (n/k) < 10^- D 

Thus, Approx-CmsD/i will terminate in 0(loglog {n/k)) rounds. Each round 
has a call to FindCover, which makes at most 0(log {n/k)) calls to Paramet- 
ricBmcp. Using T{x) to denote the running time of Bmcp, the time taken by 
all the calls to ParametricBmcp is 0((n^ log n + T(n^))log]^^j OPT). Thus, 
the running time of the approximation algorithm is 

0(loglog (n/fc)[log {n/k){n^ logn + T{n^)) log^+g OPT]). 

Since T{x) is polynomial by |KMN99j . so is our algorithm. 

Now all that is left is to show that the total cost is no more than the stated 
bound. Let Ci denote the set of local clusters from round i. Since Ci covers the 
set of N vertices, one from each D G T>i, we know that each D G T>i intersects 
a cluster C in C^. Let 'Pi and t/i be the sum of diameters of the global and local 
clusters during the round respectively. The following lemma can be proven 
by induction on i. 
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Lemma 7. After i rounds of AppROX-CMSD/i, If'i < 

To get the total cost of all global clusters at the end of the algorithm, we 
just need to compute ifs+iog log {n/k)^ since it was shown in Lemma Elthat the 
number of rounds is at most 5 + log 2 log 2 (n/k). 

Lemma 8. iZ's+iogs log^ (n/fc) = OPT • 0{ln{n/k)). 

Proof Note that by Lemma Q tf's+iog^ log^ (n/fe) = ("/*) ^.^ gy g^p_ 

arating the summation into the first term and all others, and noticing that Ui 
is decreasing (i.e., all terms with Ui>i are upper bounded by ni), we get that 
the first term in the summation is 3[1 + ln(n/fc)](l + e)OPT, and the rest of 
the terms are OPT • 0((log2log2 (n/fc))^). For large enough n/k, the first term 
dominates all of the rest, so for some constant e' > e, the cost is no more than 
3[1 + In (n/fc)](l + e')OPT = OPT • 0(log2(n/fc)), with small constant terms. 

□ 

Summarizing the above discussion, we have: 

Theorem 2. There is a polynomial time approximation algorithm for the 
CMSD/i problem that returns at most lOfc clusters whose total diameter is at 
most 0(ln(n//c)) times the optimal solution value with k clusters. □ 

3.3 An Approximation Algorithm for Cmsd^i with Fixed k 

When k is fixed, it is possible to obtain a simple 2-approximation algorithm for 
the CmsD /1 problem using the transformation shown in Figure ^ We present 
this result below. 

Theorem 3. When the number of clusters k is fixed, there is a 2- approximation 
algorithm for CmsD/^. 

Proof. The steps of the approximation algorithm are as follows. 

1. Using the transformation of Figure^ construct an instance of the minimum 
cost set cover problem from the given instance of CMSD/i with the parameter 
/ set to zero. 

2. Find a minimum cost set cover consisting of at most k sets. Since k is fixed, 
this step can be done in polynomial time by exhaustive search. 

3. If the collection of sets obtained in Step 0 are not pairwise disjoint, then 
repeatedly merge pairs of sets with nonempty intersection until the collection 
is pairwise disjoint. 

4. Output the collection of sets found in Step 0 as the solution to the CmsD/^ 
instance. 

Clearly, the approximation algorithm runs in polynomial time. Applying 
Lemma 0with / = 0, the cost of an optimal set cover is at most twice the 
optimal solution value of the CMSD/i instance. Step 0 finds an optimal solution 
to the set cover problem, and by Lemma 0 the merging operations in Step 0 do 
not increase the total diameter of the clusters. Thus, the total diameter of the 
clusters produced is at most twice the optimal value. □ 
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4 Non-approximability Results 

4.1 Non-approximability without Triangle Inequality 

We show that, unless P = NP, Cmsd cannot be efficiently approximated to 
within any factor even when the number of clusters is fixed at 3. This result 
can be established through a simple modification to the reduction from Graph 
3-colorability (3 -Color) to Cmsd given in |Hr7Sj . We omit this proof due to 
space constraints. 

Proposition 1. Unless P = NP, for any p > 1, no polynomial time algorithm 
for the Cmsd problem ean provide a performance guarantee of p. □ 

This non-approximability result should be contrasted with the known result 
that the Cmsd problem is solvable in polynomial time for 2 clusters IH,ls7IMMshl . 

4.2 A Non-approximability Result for CmsD/^ 

Here, we prove our non-approximability result for Cmsd/^. We establish this 
result through a reduction from the well known Clique problem unini. 

Proposition 2. Unless P = NP, for any e > 0, no polynomial time algorithm 
for the CMSD/i problem ean provide a solution which satisfies the bound on the 
number of clusters and whose total diameter is within a factor 2 — e of the optimal 
value. 

Proof. We use a reduction from the Clique problem. Let the undirected graph 
G{V, E) and integer J < \V\ denote an arbitrary instance of the Clique problem. 
We construct an instance of the Cmsd/^ problem consisting of a complete edge 
weighted graph G' on the vertex set V as follows. For any pair of vertices u and 
V, the weight of {u,v} is set to 1 if {m,u} is an edge in E and to 2 otherwise. 
Obviously, the resulting edge weights satisfy the triangle inequality. The number 
of clusters k is set to |P| — J -I- 1. Now, it is straightforward to see that if G 
has a clique with J or more vertices, then G' can be partitioned into at most k 
clusters with a total diameter of 1: the vertices of the clique form one cluster of 
diameter 1 and each of the remaining |R| — J vertices forms a separate cluster 
with a diameter of zero. Further, if G does not have a clique with J or more 
vertices, then any solution with at most k clusters must have a total diameter 
of at least 2. The proposition follows. □ 

5 Other Results 

In this section, we briefly mention our other results on the Cmsd problem. Details 
concerning these results will appear in a complete version of the paper. 

We have considered the Cmsd problem when the underlying graph is a tree 
with edge weights (rather than a complete graph). In this version, the distance 
between any pair of nodes is the length of the path between the nodes in the 
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tree. For this problem, we have developed a polynomial time algorithm using 
dynamic programming. This algorithm uses 0{k'n?) space and runs in 0{k^nP) 
time. It can also be extended to work for graphs of bounded treewidth. 

We have also considered the clustering problem where the goal is to minimize 
the sum of the radii of the clusters (rather than the sum of the diameters). To 
discuss these results, we first recall the definition of cluster radius. Let C be a 
cluster. For any node v in C, let dy denote the maximum distance between v and 
any other node in C. The radius of C is given by minjdt, : v G C}. A node v for 
which dy is equal to the radius of C is a center of C. When edge weights satisfy 
the triangle inequality, the diameter of a cluster is at most twice the radius. 
Therefore, our approximation result for CMSD/i (Section 0) carries over (with a 
different constant within the big-0) to the clustering problem where the goal is 
to minimize the sum of the radii. We have also been able to show an interesting 
contrast between the diameter and radius problems for the non-metric case. For 
fixed k, while it is NP-hard to obtain even an approximation for the non-metric 
version of the diameter problem ISection 14.111 . the corresponding problem for 
radius can be solved in polynomial time. 
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Abstract. We consider complete graphs with nonnegative edge weights. 
A p-matching is a set of p disjoint edges. We prove the existence of a 
maximal (with respect to inclusion) matching M that contains for any 
p < \M\ p edges whose total weight is at least of the maximum weight 

of a p-matching. We use this property to approximate graph partitioning 
problems in which the sizes of the parts of the partitioning are given. 



1 Introduction 

Let G = {V,E) be a complete graph with vertex set V such that \V\ = n, edge 
set E, and edge weights w{u, v) > 0, {u, v) G E. A p-matching is a set of p 
disjoint edges in a graph. A p-matching with p = [|J is called perfect. A perfect 
matching M that contains for any p < \M\ a p-matching whose weight is at least 
a times the maximum weight of a p-matching is said to be a-robust. We prove 
that G contains a ;^-robust matching. On the other hand, there are graphs that 
do not contain an a-robust matching for any « > 

In Section Owe generalize the robustness concept to independence systems. 
Our theorem on robust matchings is proved in SectionO and we use it to approx- 
imate within a factor the following problem: Given constants ci > C 2 > • • • > 
Cp, find a p-matching M that maximizes GWi where wi > W 2 > ■ ■ ■ > Wp 

are the edge weights in M. 

In Section Owe use these results to approximate the MAXIMUM clustering 
PROBLEM WITH GIVEN SIZES OF THE SIDES in which the goal is to partition the 
vertex set to subsets of given sizes maximizing the total edge weight within the 
same cluster. In the full version of this paper we also apply our results here to 
approximate within the same bound the maximum capacitated star packing 
PROBLEM in which it is also required to locate a center within each cluster and 
the goal is to maximize the total distance from each vertex to its center. In both 
cases we assume that the edge weights satisfy the triangle inequality. 

For V' CV we denote by E{V) the edge set of the subgraph induced by V . 
For E' C E we denote by W{E') the total weight of edges in E' . 

For an optimization problem under consideration we denote by opt the op- 
timal solution value and by apx the approximate value returned by a given ap- 
proximation algorithm. Some of the proofs are omitted in this extended abstract 
and will appear in the full version of the paper. 
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2 Robust Independent Sets 

An independence system is a pair (E, E) consisting of a ground set E and a 
collection of independent sets, or equivalently, feasible solutions, such that F' C 
F G E implies F' G E. Let u>e > 0 e G if be weights attached to the elements of 
E. The problem of computing an independent set of maximum weight generalizes 
many interesting combinatorial optimization problems. Korte and Hausmann 0 
analyzed the performance of the greedy algorithm for the above problem. The 
algorithm sorts the elements by weight and inserts them into the solution starting 
with the heaviest one and excluding an element if its addition would generate a 
set not in E . They proved the following theorem: 

Theorem 2.1 For any E' Q E define l{E')and u{E') to be the smallest and 
largest cardinality, respectively, of a maximal (with respect to inclusion) indepen- 
dent set contained in E' . Let r{E,E) = minE'<ZE uiEi) > greedy solution 

is an r{E,E)~ approximation, that is, the value of the greedy solution is at least 
r{E,E) times the optimal value. 

Consider now the following game: You choose a maximal independent set in 
E. An adversary then selects p G {1, ..., [nj}. Finally, you output the p heaviest 
elements of your solution. By the definition of an independence system, the 
output is independent. Your payoff is the ratio between the weight of your output 
and the maximum weight of an independent set whose cardinality is at most p. 
A solution is a-robust if it guarantees that the payoff is at least a. 

Theorem 2.2 The greedy solution is r{E,E) -robust. 

The edges and matchings in a graph constitute an independence system for 
which r = I 0. It follows that the greedy solution is ^-robust. We obtain 
stronger results in the next section. 

Let Cl > C 2 > • • • > Cm Y 0 be given constants. For an independent set 
F = {ei,...,Cm} with weights W\,W 2 , . . . ,Wm define C{F) = Yl'JLiCjWj. Since 
we are interested in obtaining large values of C{F), we will assume that for any 
given matching the edges are numbered so that W\ > W 2 > ■ ■ ■ > Wm- Thus, 
C{F) is well defined for any set F without explicitly specifying an order on its 
edges. We will also denote Fp = {ci, ..., Cp}, p = I, ..., m and Fp = F iov p> m. 

Problem 2.1. Compute F G E, |F"| < p, that maximizes C{Fp). 

The following theorem was proved by Gerhard Woeginger 

Theorem 2.3 Problem [Q is NP-hard even when E is the set of matchings in 
a graph with edge set E (F C E is in E if it consists of vertex- disjoint edges). 



Theorem 2.4 Let F and F' be independent sets. If F' is a-robust then C{Fp) > 
aC{Fp) for every p = 1,2, ... and any constants ci > C 2 > • • • > Cm > 0. 
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Proof: Let wj = 0 j > |J^|. Let wi > W2 > ■ ■ ■ > Wp and w[ > w'2 > ■ ■ ■ > w'p 
be the edge weights of Fp and F^, respectively. Then, 

p-i j p 

C{Fp) = '^{cj - Cj+i) ^ w- + Cp ^ w- 

j-'^l i—1 2=1 

p-1 

= - c,+,)W{F;) + CpWiF^) 

i=i 

p-1 

^ ~ Ci+i)aVL(Fj) + CpaVL(Fp) 

i=i 

P 

= ay^CiWi = aC{Fp). 

i=l 



3 Robust Matchings 

A matching is a set of vertex-disjoint edges. The weight of a matching is the total 
weight of its edges. A maximum matching is a matching with maximum weight. 
A p-matching is a matching with p edges. We denote m = the maximum 
number of edges in a matching. An m-matching is said to be perfect. 

For a perfect matching M we define Mp to be the set of its p heaviest edges, 
p = 1 , ..., m. We denote by a maximum p-matching. A matching is a-robust 
if 

W{Mp) > aW{M^P'>) p=l,...,m. 

In this section we show that for every graph there exists a ;^-robust matching 
and that it can be constructed by a single application of a maximum match- 
ing algorithm. The following example shows that the value of cannot be 
increased. 

Consider a 4-vertex graph with weights ?u(l, 2) = w{3, 4) = 1, w{2, 3) = \/2 
and all other edges have zero weight. For this graph W (Mi) = \/2 and W (M 2 ) = 
2. The graph has three perfect matchings and none is a-robust for ct > The 
matchings {(1, 2), (3, 4)} and {(2, 3), (1, 4)} are ;^-robust and {(1, 3), (2, 4)} is 
0-robust. 

Theorem 3.1 Let S be a maximum perfect matching with respect to the squared 
weights w^{e) e G E. S is -^-robust. 

The rest of this section is devoted to proving Theorem ft. II We will prove it by 
treating the squared edge weights as variables whose sizes are to be determined in 
order to form a contradiction to the theorem. We will prove that to achieve such 
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a contradiction we may make several assumptions on these variables. Finally 
these assumptions will lead to the conclusion that the claim is true. 

Consider the set S U M^p\ It consists of a collection of disjoint paths and 
cycles. A path may consist of a single edge or it alternates between S and 
Since S is perfect, the end edges of the path are from S except possibly one 
end of one path in the case of odd n (since in this case there is exactly one 
vertex which is not incident to an edge of S.) A cycle alternates between S and 
jVf(p), will construct from the edges of S' a p-matching whose weight is at 
least Since the weight of this matching is at most the weight of the p 

heaviest edges in S, this construction will prove the theorem. 

We choose a p-matching from S as follows: Every edge in S U is chosen. 
All of the edges of S contained in a cycle of S U are chosen. From every 
nontrivial path (containing more than a single edge) of S U M^p'> we choose all 
the edges that belong to S except for the lightest one. There is one exception to 
the last rule: If (n is odd and) there is a path with only one end edge from S then 
we choose all of the S'-edges of this path. The total number of edges selected is 
equal to = p. It is sufficient to prove that the claimed bound on the ratio 

of the edge weights in S and in M^p'> holds for every such path and cycle. 

Consider a nontrivial path P with squared weights xi,yi,X 2 ,y 2 , J/r-i, 
where the x values correspond to the edges of S and the y values correspond to 
the edges of M^p'> in the order they appear on P. 

We denote similarly y[ij] = interested 

in subpaths Pij of P consisting of the edges whose weights are Xi,yi, ...,yj-i,Xj. 
Note that P = P\^r- Since S is maximum with respect to the squared weights, 

X[i^j] > y[i,j-i] I <i < j <r. (1) 

Let Xmin — min{xi | z = 1, Our goal is to prove that the ratio of the 

total weight of the r — 1 heaviest edges in P fl S' to the weight of P fl is at 
least that is, 

1 \/^ \/ Xmin ^ 1 

for all x,y that satisfy O- 

We will prove that Z > ^ for every nontrivial path by induction on r. 
Note that the proof and induction hypothesis apply to any nontrivial path P in 
S n M^P\ not just to maximal (with respect to inclusion) paths. A subpath is 
subject to additional constraints arising from longer subpaths that contain it, 
but these constraints may only increase the lower bound on Z for the subpath 
in question. 

Lemma 3.2 Z > ^ when r = 2. 



Lemma 3.3 Z > ^ when r = 

V 2 



3 . 
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We now proceed to proving the general step of the induction for r > 3. Thus 
we assume that the claim holds for smaller r values. 



Lemma 3.4 We can assume that Xj > Xmin j = 2, ...,r — 1. 
Proof: Suppose that Xj = Xmin for some j S {2, ...,r — 1}. Then, 



Z = 



\/ \/ ‘^miTi} T \/ \/ ^min) 






> min{ 






=3 

- 



i 






-}• 



Since Xj = minjxi | i = = min{a;i | i = j, it follows from the 

induction hypothesis that Z > ^. ■ 

We call a subpath Pi j for which X[ij] = tight. 



Lemma 3.5 (i) Let i < k < j < I such that i < j and k < 1. If k < j and both 
Pi j and Pf^ i are tight then so is Pj^k- (H) Let i < j < k. If Pij is tight then Pj^k 
isn’t. 



Proof: (i) By assumption, X[ij] = y[ij-i] and X[k,i] = y[k,i-i]- Summing these 
equations we get 

T — 1] y[z,/— 1] T y[k,j—l]- 

Since > y[ij-i] and x^kj] > y[k,j-i] it follows that both of the latter relations 
satisfy equality and the respective subpaths are tight. 

(ii) From the same equation with j = k it follows that Xj = 0 and 1 < j < r, in 
contrast to Lemma El ■ 

Suppose that r > 3. Let 1 < j < r. We can assume that there exists a tight 
interval containing Cj, otherwise we reduce Xj till some subinterval containing Cj 
becomes tight, and this change reduces Z. Consider the intersection of all tight 
intervals containing Cj G S. It follows from Lemma 13 . 51 that the intersection is 
a non-trivial tight subpath. Again by these lemma, the x values in this subpath 
share the same set of tight subpaths and therefore we can assume that the sum 
of their squared roots is minimized subject to a single constrain on their sum. 
By concavity of the square root function, this objective is attained by setting 
all of these values to 0 except for a single one, say Xk > 0. From T;emma 1.3.41 
and since Xmin > 0, it follows that either k < 3 or k = 4. For the former case 
the claim has already been proved in T^emmas I.S. 2 l a,nd 13. .SL In the latter case, it 
must be that P \2 and P 34 are tight and thus x\ = X 4 , = y 2 = 0 while yi = X 2 
and 2/3 = X 3 . In this case Z = 1 and this completes the proof for paths with two 
ends from S. 

For a path with only one end edge from S we may assume that a fictitious S- 
edge of zero weight is added at that end. The set of constraints d) then extends 
in a natural way and the same proof holds. 
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Suppose now that there is a cycle C that contradicts the claim. We will show 
how to construct an instance consisting of a path that contradicts the claim. 
Since we have already proved that this is impossible, it will follow that such 
a cycle cannot exist. Specifically, we form a path by cutting C at an arbitrary 
vertex and joining many copies of C pasted at the cut point. Finally add an 
X edge at the end where it is missing with a sufficiently large weight, such as 
w(C'n so that O is satisfied. The path obtained this way will have 

(asymptotically, as the number of pasted copies increases) the same Z-value as 
C. This concludes the proof of Theorem EH 

4 Clustering 

In the MAXIMUM CLUSTERING PROBLEM, the goal is to partition the vertex set 
V into sets of given sizes so that the total weight of edges inside the clusters is 
maximized. We treat a version of the problem in which cluster sizes ci > C 2 > 

• • • > Cp > 1 such that ci + • • • + Cp = n are given. In the uniform version. 
Cl = C 2 = • • • = Cp. We consider the problem under the triangle inequality 
assumption. 

Feo and Khellaf |2| treated the uniform case and developed a polynomial 
algorithm whose error ratio is bounded by 2 {c-i) where c = ^ is the 

cluster’s size and it is even or odd, respectively. The bound decreases to | as c 
approaches oo. The algorithm’s time complexity is dominated by computation of 
a maximum weight perfect matching. (Without the triangle inequality assump- 
tion, the bound is or respectively, but Feo, Goldschmidt and Khellaf P 
improved the bound to ^ in the cases of c = 3 and c = 4.) We describe an 
alternative algorithm for the uniform case that achieves the ratio of ^ and has 
a lower 0{n^) complexity. 

Hassin, Rubinstein and Tamir generalized the algorithm of |2| and ob- 
tained a bound of | for computing k clusters of size c each (1 < /c < ") with 
maximum total weight. Our discussion concerning the uniform case does not 
apply to this generalization. 

We first state some results concerning the uniform case. Consider the set of 
partitions of V into clusters of size c each. A random solution is obtained by 
randomly selecting such a partition. 

Theorem 4.1 In the uniform case, under the triangle inequality assumption, 
the expected weight of a random solution is at least ^opt. 

The algorithm can be easily derandomized while preserving its performance 
guarantee. 

We now treat the general case. Given ci > C 2 • • • > Cp, we want to partition V 
into clusters of these sizes maximizing their total weight. We note that a random 
solution may have a very small weight relative to opt. 

Let dj = , Dj = di + ■ ■ ■ + dj j = 1, ...,p, and Dq = 0. We propose the 

following algorithm: 
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Algorithm 4.2 

1. Compute a maximum matching S with respect to the squared weights. Let 

S = {{uj,Vj) j = where w{uj,Vj) > w{uj+i,Vj+i) j = 1. 

2. Set Vi = {uj,Vj I j = A-i + 1, A} i = 1, ■■■,?■ 

3. For each i such that Ci is odd, add to V an arbitrary yet unassigned vertex. 



Theorem 4.3 



1 

avx > — i=opt. 

~ 2V2 



Proof: Consider an optimal partition Oi, Op. Let Mi be a maximum matching 
in the subgraph induced by Oi, i = 1, ...,p. Denote the edge weights in Mi by 
w\>---> w\. 

Let hi = Ci — lii Ci is even and bi = Ci if Ci is odd. The edges of E{Oi) can be 
covered by a set of bi < Ci disjoint matchings. Since Mi is a maximum matching 
in Gi it follows that hiW{Mi) > W{E{Oi)) and therefore 



p 

opt < ^ CiW{Mi). 

i=l 

Let Vi, ..., Vp be the partition produced by A]gorithm l4.2L Let Si = SCE^Vi). 
Consider a cluster Vi with vertices u,v,q € V such that (u,v) € Si. By the 
triangle inequality, w{u, q) + w{v, q) > w{u, v). 

Suppose that Ci is even. Sum this inequality over all q u, v € Vi, then sum 
again over (u, v) G Si. Note that every edge in E{Vi)\Si is summed twice. Thus, 
every edge (u,v) G Si contributes to the total weight of E{Vi) in addition to its 
own weight also at least 1 (c^ — 2) times its weight through the edges incident to 
it. Thus, W{E{V)) > \c^W{Si). 

Suppose now that Ci is odd. In this case Vj contains a vertex, say Vi, that 
was added to Vi in Step 3 of the algorithm. In the summation, the weight of 
edges incident to Vi is used just once. Thus, each edge {u,v) G Si contributes 
its weight \{ci — 3 ) times when summed over Vi \ {u,v,Vi}, once more through 
w{u,Vi) + w{v,Vi), and once it contributes its own weight. Thus, also in this 
case, W{E{Vi)) > \c,W{Si). 

By Theorem 12.41 and the assumption c\> ■ ■ ■ > Cp, 



apx > X 



2=1 



> 



VI 



2\/2 



opt. 



> 
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Abstract. The hospitals/residents problem is an extensively-studied 
many-one stable matching problem. Here, we consider the hospitals/ 
residents problem where ties are allowed in the preference lists. In this 
extended setting, a number of natural definitions for a stable matching 
arise. We present the hrst linear-time algorithm for the problem under 
the strongest of these criteria, so-called super- stability. Our new results 
have applications to large-scale matching schemes, such as the National 
Resident Matching Program in the US, and similar schemes elsewhere. 



1 Introduction 

The Hospitals/Residents problem (HR) f4ll4^ is a many-one stable matching 
problem which is so-named because of its application to large-scale matching 
schemes, such as the National Resident Matching Program in the US C21, the 
Canadian Resident Matching Service P , and the Scottish Pre-registration house 
officer Allocations (SPA) matching scheme 0. Each of these centralised schemes 
administers the annual match of graduating medical students to hospital ap- 
pointments in its respective country. 

An instance of HR involves a set TZ of residents and a set Ti of hospitals, 
each resident r G TZ seeking a post at one hospital, and each hospital h G Ti 
having q{h) > 1 posts. Each resident in TZ ranks a subset of Ti in strict order, 
and each hospital h G TT ranks its applicants in strict order. An agent p GTZUTT 
finds an agent q G TZUH acceptable if q appears on p’s preference list; p finds q 
unacceptable otherwise. A matching M is a subset oiTZx'H, where {r,h) G M 
implies that (i) r, h find each other acceptable, (ii) r is assigned to at most one 
hospital in M, and (iii) at most q{h) residents are assigned to h in M. A matching 
M for an instance of HR is stable if M admits no blocking pair. A blocking pair 
(r, h) for M is a resident r and hospital h such that (i) r, h find each other 
acceptable, (ii) r either is unassigned or prefers h to his assigned hospital in M, 
and (iii) h either is undersubscribed or prefers r to the worst resident assigned to 
it in M. If (r, h) form a blocking pair with respect to a matching M, then (r, h) 
is said to block M. Also, if (r, h) G M for some stable matching M, then we say 

* Supported by Engineering and Physical Sciences Research Gouncil grant number 
GR/M13329. 



M.M. Halldorsson (Ed.): SWAT 2000, LNCS 1851, pp. 2,53- E7TI 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



260 



R.W. Irving, D.F. Manlove, and S. Scott 



that (r, /i) is a stable pair, and r is a stable partner of h (and vice versa). Note 
that, in view of the definitions of a matching and a blocking pair, we assume 
throughout this paper, without loss of generality, that an agent p finds an agent 
q acceptable if and only if q finds p acceptable. We say that the preference list 
of a resident r G TZ (resp. hospital h G TL) is complete if r (resp. h) finds all 
hospitals in H (resp. residents in TZ) acceptable. 

The classical Stable Marriage problem (SM) mm is a restriction of HR 
in which each hospital has exactly one post, the number of hospitals equals the 
number of residents, and all preference lists are complete. For a given instance 
I of HR, the Gale/Shapley algorithm for SM ^ may be extended in order to 
find a stable matching for I (such a matching in I always exists) in 0(mn) 
time, where n = \TZ\ and m = \H\ ^ Section 1.6.3]. The Gale/Shapley algo- 
rithm incorporates a sequence of proposals from one set of agents to the other; 
if the residents propose to the hospitals (the resident- oriented algorithm), then 
we obtain a stable matching M which is uniquely favourable to the residents: 
every resident assigned in M is assigned to his best stable partner, and every 
resident unassigned in M is unassigned in any stable matching Section 1.6.3]. 
Analogously, if the hospitals propose to the residents (the hospital- oriented algo- 
rithm), then we obtain a stable matching M which is uniquely favourable to the 
hospitals: every hospital h G TL is assigned either its q{h) best stable partners, or 
a set of fewer than q{h) residents; in the latter case, no other resident is assigned 
to h in any stable matching 0, Section 1.6.2]. 

Although an instance of HR may admit more than one stable matching, every 
stable matching has the same size, matches exactly the same set of residents, 
and fills exactly the same number of posts at each hospital; indeed any hospital 
that is undersubscribed in one stable matching is assigned exactly the same set 
of residents in all stable matchings. (These results are collectively known as the 
‘Rural Hospitals Theorem’ H2EIE1-) 

Ties in the preference lists. A natural generalisation of HR occurs when 
each agent’s preference list need not be strictly ordered, but may include ties - 
we refer to this extension as the Hospitals/Residents problem with Ties (HRT). 
When ties are permitted, more than one definition of stability is possible jS| • 

According to the weakest of these stability notions, a matching M is weakly 
stable [S| if M admits no blocking paiiQ where a blocking pair (r, h) for M is 
a resident r and hospital h such that (i) r, h find each other acceptable, (ii) r 
either is unassigned or strictly prefers h to his assigned hospital in M, and (iii) 
h either is undersubscribed or strictly prefers r to the worst resident assigned to 
it in M. Given an instance I of HRT, the existence of a weakly stable matching 
is guaranteed: by breaking the ties in / arbitrarily, we obtain an instance I' of 
HR, and clearly a stable matching in /' is weakly stable in I. Indeed, a converse 
of sorts holds, giving the following proposition, whose proof is straightforward 
and is omitted. 

^ Note that throughout this paper, the form of stability to which the term blocking 
pair refers should be clear from the context. 
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Proposition 1. Let I be an instance of HRT, and let M be a matching in I . 
Then M is weakly stable in I if and only if M is stable in some instance I' of 
HR obtained from I by breaking the ties in I in some way. 

However, the weakly stable matchings in I may be of different cardinality, and 
each of the problems of finding the maximum or minimum size of weakly stable 
matching in an HRT instance is NP-hard, though approximable within a factor 

of 2 rziini . 

A stronger form of stability may be defined as follows: a matching M is 
super-stable if M admits no blocking pair, where a blocking pair (r, h) for 
M is a resident r and hospital h such that (i) (r, h) ^ M, (ii) r, h find each 
other acceptable, (iii) r either is unassigned or strictly prefers h to his assigned 
hospital in M or is indifferent between them, and (iv) h either is undersubscribed 
or strictly prefers r to the worst resident assigned to it in M or is indifferent 
between them. Clearly a super-stable matching is weakly stable. Additionally, 
the super-stability definition gives rise to the following analogue of Proposition 
n (again, the proof is straightforward and is omitted): 

Proposition 2. Let L be an instance of HRT, and let M be a matching in L . 
Then M is super-stable in L if and only if M is stable in every instance L' of 
HR obtained from L by breaking the ties in I in some way. 

It should be clear that an instance / of HRT may not admit a super-stable match- 
ing: as a simple example, suppose that each hospital has just one post, and every 
agent’s list is a single tie of length 2. It is the purpose of this paper to present 
optimal 0{mn) algorithms ~ linear in the size of the problem instance - for de- 
termining whether a given instance of HRT admits a super-stable matching, and 
if it does, to construct such a matching. The first algorithm, presented in Section 
121 is resident-oriented in that it involves a sequence of proposals from the resi- 
dents to the hospitals, and has similar optimality implications for the residents 
to those of the resident-oriented algorithm for HR. Also in Section 0 we prove 
an analogue of the Rural Hospitals Theorem for HRT. The second algorithm, 
presented in Section |2l is the hospital-oriented version, incorporating proposals 
from the hospitals to the residents, with analogous optimality implications for 
the hospitals to those of the hospital-oriented algorithm for HR. 

For space reasons, the majority of our attention is focused on the resident- 
oriented algorithm for HRT. It is this algorithm that is likely to be of more 
significance to implementors of large-scale matching schemes, since recent pres- 
sure from student bodies has ensured that all three matching schemes mentioned 
above essentially employ the resident-oriented algorithm for HR. 



Applications. Note that permitting ties in the preference lists has important 
practical applications. In the context of centralised matching schemes, some 
participating hospitals with many applicants have found the task of producing 
a strictly ordered preference list difficult, and they have expressed a desire to 
include ties in their lists. In such a setting, choosing the weak stability definition 
leads to two problems: (i) finding a weakly stable matching that matches as many 
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residents as possible, and (ii) the possibility of, say, a resident r persuading, by 
some means, a hospital h to accept r at the expense of some allocated resident r' , 
if h is indifferent between r and r' . The super-stability definition clearly avoids 
problem (ii), and additionally guards against problem (i), as is demonstrated by 
the following proposition, which is a consequence of Propositions Q and |2| and 
the Rural Hospitals Theorem for HR. 

Proposition 3. Let I be an instance of HRT, and suppose that I admits a 
super-stable matching M . Then the Rural Hospitals Theorem holds for the set of 
weakly stable matchings in I . 

Thus Proposition^tells us that if a super-stable matching exists, then all weakly 
stable matchings are of the same size, and match exactly the same set of res- 
idents. Of course, as observed earlier, a super-stable matching need not exist. 
Nonetheless, it is arguable that a super-stable matching should be preferred by a 
practical matching scheme in cases when one does exist. In Section E] we address 
the issue of the existence of super-stable matchings in an HRT instance. 

Previous work. As mentioned above, optimal algorithms for constructing sta- 
ble matchings in an instance of HR are known. For the case where ties are 
permitted, there is an optimal O(n^) algorithm, due to Irving 0, for determin- 
ing whether a given (one-one) instance of Stable Marriage in which preference 
lists are complete but may incorporate ties (henceforth SMT) admits a super- 
stable matching, and for constructing one if it does, where n is the number of 
men and women. However, the problem of formulating such an algorithm for the 
(many-one) HRT case has remained open until now. 

2 Resident-Oriented Algorithm for HRT 

For a given instance of HRT, Algorithm HRT-Super-Res, shown in Figure Q 
determines whether a super-stable matching exists, and if so will find such a 
matching. We shall describe informally the execution of Algorithm HRT-Super- 
Res. Before doing so, we make a number of definitions. 

For a given instance I of HRT, suppose that (r, h) G M for some super-stable 
matching M . Then (r, h) is a super-stable pair, and r is a super-stable partner 
of h (and vice versa). The term delete the pair (r, h), implies that r, h are to be 
deleted from each other’s preference lists. By the head of a resident’s preference 
list, we mean the set of one or more hospitals, tied in his current list (i.e. his 
preference list after any deletions have been carried out), which he strictly prefers 
to all other hospitals in his list. Similarly, the tail of a hospital’s list refers to the 
set of one or more residents, tied in its current list, to whom it strictly prefers 
all other residents in its list. By the term reduced lists, we mean the current lists 
at the termination of Algorithm HRT-Super-Res. 

Algorithm HRT-Super-Res involves a sequence of proposals from the resi- 
dents to the hospitals, in the spirit of the resident-oriented Gale/Shapley algo- 
rithm for HR. A resident proposes simultaneously to all hospitals at the head 
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assign each resident to be free; 
assign each hospital to be totally unsubscribed; 
for each hospital h loop 
fullih) false; 
end loop; 

while some resident r is free and has a nonempty list loop 
for each hospital h at the head of r’s list loop 
provisionally assign r to h; {r “proposes” to h\ 

if h is oversubscribed then (f) 

for each resident s' at the tail of h’s list loop 
if s' is provisionally assigned to h then 
break the assignment; 

end if; 

delete the pair {s' , h); 

end loop; 
end if; 

if h is full then (J) 

fuU{h) true; 

s worst resident provisionally assigned to h; {any one, if > 1} 
for each strict successor s' of s on h’s list loop 
delete the pair {s' , h); 

end loop; 
end if; 
end loop; 
end loop; 

if some resident is multiply assigned or 
(some hospital h is undersubscribed and full{h)) then 
no super-stable matching exists; 
else 

the assignment relation is a super-stable matching; 

end if; 



Fig. 1. Algorithm HRT-Super-Res. 



of his list, and all proposals are provisionally accepted. If a hospital h becomes 
oversubscribed, it turns out that none of h's worst-placed assignees (there must 
be more than one), nor any residents tied with these assignees in h’s list, can 
be a super-stable partner oi h - such pairs (r, h) are deleted. If a hospital h is 
full, then no resident strictly inferior than h’s worst-placed assignee (s) can be 
a super-stable partner of h - again such pairs (r, h) are deleted. The proposal 
sequence terminates once every resident either is assigned to a hospital or has 
an empty list. At this point, it turns out that if a resident is assigned to more 
than one hospital, or some hospital is undersubscribed but was previously full, 
then no-super-stable matching exists. Otherwise, the assignment relation is a 
super-stable matching. 

In order to establish the correctness of Algorithm HRT-Super-Res, a number 
of lemmas follow. The first three of these deal with the case that the assignment 
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relation is claimed to be a super-stable matching. In what follows, I is an instance 
of HRT, in which TZ is the set of residents and TL is the set of hospitals. 

Lemma 1. If, at the termination of the while loop of Algorithm HRT- Super- Res, 
the algorithm reports that the assignment relation M is a super-stable matching, 
then M is indeed a matching. 

Proof. Clearly, no hospital is oversubscribed in M . Also, no resident is multiply 
assigned in M , for otherwise the algorithm would have reported that no super- 
stable matching exists, a contradiction. □ 



Lemma 2. If the pair (r, h) is deleted during an execution of Algorithm HRT- 
Super-Res, then that pair cannot block any matching generated by Algorithm 
HRT- Super- Res, comprising pairs that are never deleted. 

Proof. Let M be a matching generated by Algorithm HRT-Super-Res, com- 
prising pairs that are never deleted, and suppose that (r, h) is deleted during 
execution of the algorithm. If h is full in M, then h strictly prefers its worst- 
placed assignee in M to r, since r is a strict successor of any undeleted entries 
in the reduced list of h. Hence (r, h) does not block M in this case. Now suppose 
that h is undersubscribed in M. As the pair (r, h) is deleted by the algorithm, 
then during some iteration of the while loop, h must have been full. Hence the 
algorithm would have reported that no super-stable matching exists rather than 
generating M, a contradiction. □ 



Lemma 3. If, at the termination of the while loop of Algorithm HRT-Super-Res, 
the algorithm reports that the assignment relation M is a super-stable matching, 
then M is indeed a super-stable matching. 

Proof. By Lemma^ the assignment relation M is a matching. Now suppose that 
M is blocked by some pair (r, h). Then r and h are acceptable to each other, so 
that each is on the original preference list of the other. By Lemma |2I the pair 
(r, h) has not been deleted. Hence each is on the reduced list of the other. 

As the reduced list of r is nonempty, r is assigned to some hospital h' in M . 
Now h' h, as (r, h) blocks M. If r strictly prefers h to h' , then the pair (r, h) 
has been deleted, since h' is at the head of the reduced list of r, a contradiction. 
Thus r is indifferent between h and h' , so that r proposed to h during the 
execution of the algorithm. Hence r is assigned to h in M, for otherwise the pair 
(r, h) would have been deleted, a contradiction. Thus (r, h) does not block M, a 
contradiction. □ 

The next lemma shows that Algorithm HRT-Super-Res will never delete a pair 
that could belong to some super-stable matching. 

Lemma 4. No super-stable pair is ever deleted during an execution of Algorithm 
HRT-Super-Res. 
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Proof. Suppose, for a contradiction, that (r, h) is the first super-stable pair to 
be deleted during an execution of Algorithm HRT-Super-Res. Let M he & super- 
stable matching in I such that (r, h) G M. 

Case (i). Suppose that (r, h) is deleted as a result of h being oversubscribed. 
Consider the assignment relation G at point (f) in the same iteration of the 
while loop. At this point, some resident s is provisionally assigned to h in G, 
where (s, h) ^ M and h strictly prefers s to r or is indifferent between them, 
since (r, h) G M and h cannot be oversubscribed in M . There is no super-stable 
matching in which s is assigned to a hospital h' which he strictly prefers to h. 
For otherwise, the super-stable pair (s, h') would have been deleted before (r, h), 
in order for s to propose to h, a contradiction. Thus either s is unassigned in 
M, or s is assigned to h' in M, where s strictly prefers h to h' or is indifferent 
between them. In any of these cases, (s, h) blocks M, a contradiction. 

Case (ii). Suppose that (r, h) is deleted as a result of h being full. Consider the 
assignment relation G at point (|) in the same iteration of the while loop. At 
this point, some resident s is provisionally assigned to h in G, where (s, h) ^ M 
and h strictly prefers s to r, since (r, h) G M and r is not assigned to h in G. As 
in part (i), there is no super-stable matching in which s is assigned a hospital 
which he strictly prefers to h. Thus again, (s, h) blocks M, a contradiction. □ 

The next two lemmas deal with the case that Algorithm HRT-Super-Res claims 
the non-existence of a super-stable matching. 

Lemma 5. If, at the termination of the while loop of Algorithm HRT-Super-Res, 
some resident is multiply assigned, then I admits no super-stable matching. 

Proof. Let G be the assignment relation at the termination of the while loop. 
Suppose, for a contradiction, that there exists a super-stable matching M in /. 

Firstly, we claim that some hospital must have fewer assignees in M than it 
has provisional assignees in G. For, suppose not. Let pc{h) denote the provisional 
assignees of hospital h in G, and let pmW denote the assignees of hospital h in 
M, for any h G TL. Then by hypothesis, 

\PM{h)\ > \pgW\- ( 1 ) 

hen hen 

Now if some resident r is not provisionally assigned to a hospital in G, then the 
reduced list of r is empty, so that by Lemma E] r is unassigned in any super- 
stable matching. Thus, letting Ri denote the residents who are provisionally 
assigned to at least one hospital in G, and letting i?2 denote the residents who 
are assigned to a hospital in M, we have |i?2| < l^il- Hence 

Y.\PM{h)\ = \R2\<m<Y.\pGM 

hen hen 

as some resident is multiply assigned in G, which contradicts Inequality D Thus 
the claim is established, so that some hospital h has fewer assignees in M than 
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it has provisional assignees in G. Hence h is undersubscribed in M, since no 
hospital is oversubscribed in G. In particular, some resident r is assigned to h 
in G but not in M. Thus by Lemma 01 r cannot be assigned to a hospital in M 
which he strictly prefers to h. Hence (r, h) blocks M, a contradiction. □ 

Lemma 6. If some hospital h became full during the while loop of Algorithm 
HRT- Super- Res, and h subsequently ends up undersubscribed at the termination 
of the while loop, then I admits no super-stable matching. 

Proof. Let G be the assignment relation at the termination of the while loop. 
Suppose, for a contradiction, that there exists a super-stable matching M in I. 
By Lemma |S| no resident is multiply assigned in G. Let h' be a hospital which 
became full during the while loop and subsequently ends up undersubscribed in 
G. Then there is some resident r' who was provisionally assigned to h' at some 
point during the while loop, but is not assigned to h' in G. Thus the pair (r', h') 
was deleted during some iteration of the while loop, so that {r',h') ^ M by 
Lemma 0 

Now let pa{h) ,pM{h) , R\, R 2 be defined as in the proof of LemmaO Firstly, 
we claim that if any hospital h is undersubscribed in M , then every resident 
provisionally assigned to h in G is also assigned to h in M. For, if some resident 
r is assigned to h in G but not in M, then (r, h) blocks M, since h is undersub- 
scribed in M, and by Lemma 0, r cannot be assigned to a hospital in M which 
he strictly prefers to h. 

Secondly, we claim that each hospital has the same number of provisional 
assignees in G as it has assignees in M. For, by the first claim, any hospital that 
is full in G is also full in M, and any hospital that is undersubscribed in G fills 
as many places in M as it does in G. Hence \pM{h) \ > \pG{h)\ for each h G H. 
As in the proof of Lemma 0 we also have 

Y,\PM{h)\ = \R2\<\Ri\^J2\Poih)\ 

hen hen 

since no resident is multiply assigned in G. Hence \pM(h)\ = |pg(/i)| for each 

hen. 

Thus (r', /i') blocks M, since h' is undersubscribed in M by the second claim, 
and by Lemma 01 r' cannot be assigned to a hospital in M which he strictly 
prefers to h'. □ 

Together, Lemmas 00 establish the correctness of Algorithm HRT-Super-Res. In 
addition, Lemma0implies that there is an optimality property for the partner of 
a given assigned resident in any super-stable matching output by the algorithm. 
In particular, we have proved: 

Theorem 1. For a given instance of HRT, Algorithm HRT-Super-Res deter- 
mines whether or not a super-stable matching exists. If such a matching does 
exist, all possible executions of the algorithm find one in which every assigned 
resident has as good a partner as in any super-stable matching, and every unas- 
signed resident is unassigned in all super-stable matchings. 
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By a suitable choice of data structures, Algorithm HRT-Super-Res can be im- 
plemented to run in 0{mn) time and space, where m = \'H\ and n = \'R\. The 
time bound follows by noting that the number of iterations of the while loop 
is bounded by the number of deletions from the preference lists. Note that the 
complexity of Algorithm HRT-Super-Res can also be expressed in terms of L, 
the total length of all preference lists in the HRT instance: clearly the running 
time is then 0{L). Since SM is a special case of HRT, the 17(L) lower bound of 
Ng and Hirschberg HD for SM implies that Algorithm HRT-Super-Res for HRT 
is optimal. 

We now present the Rural Hospitals Theorem for HRT under super-stability. 



Theorem 2. Let I be a given instance of HRT. Then: 

1. Each hospital is assigned the same number of residents in all super-stable 
matchings. 

2. Exactly the same residents are unassigned in all super-stable matchings. 

3. Any hospital that is undersubscribed in one super-stable matching is matched 
with exactly the same set of residents in all super-stable matchings. 

Proof. Let M, M' be two super-stable matchings in I . Let /' be an instance of HR 
obtained from I by resolving the ties in / arbitrarily. Then by Proposition I3 each 
of M, M' is stable in By the Rural Hospitals Theorem for stable matchings in 
an instance of HR ^ Theorem 1.6.3], each hospital is assigned the same number 
of residents in M and M', exactly the same residents are unassigned in M and 
M' , and any hospital that is undersubscribed in M is matched with exactly the 
same set of residents in M' . □ 

3 Hospital-Oriented Algorithm for HRT 

In this section, we consider the hospital-oriented analogue of Algorithm HRT- 
Super-Res, namely Algorithm HRT-Super-Hosp, shown in Figure El We begin 
by describing the execution of Algorithm HRT-Super-Hosp informally. 

Algorithm HRT-Super-Hosp involves a sequence of proposals from the hospi- 
tals to the residents, in the spirit of the hospital-oriented Gale/Shapley algorithm 
for HR. A hospital h proposes simultaneously to the most preferred resident r 
on /I’s list not already provisionally assigned to /i, and to all other residents 
tied with r in /I’s list. These proposals are provisionally accepted. If a resident r 
becomes multiply assigned and is indifferent between his provisional assignees, 
it turns out that neither of r’s provisional assignees, nor any hospitals tied with 
them in r’s list, can be a super-stable partner of r - such pairs (r, h') are deleted. 
If a resident r receives a proposal from a hospital h, then no hospital h' to whom 
r strictly prefers h can be a super-stable partner of r - again such pairs (r, h') are 
deleted. The proposal sequence terminates once every hospital is either full or 
provisionally assigned to everyone on its current list. At this point, it turns out 
that if a hospital is oversubscribed, or some resident is unassigned but was pre- 
viously provisionally assigned, then no super-stable matching exists. Otherwise, 
the assignment relation is a super-stable matching. 
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assign each resident to be free; 
assign each hospital to be totally unsubscribed; 
for each resident r loop 
assigned(r) := false; 
end loop; 

while some hospital h is undersubscribed and 
h’s list contains a resident r' not provisionally assigned to h loop 
r' := most preferred such resident in h’s list; {any one, if > 1} 
for each resident r tied with r' in /I’s list loop (including r'} 
provisionally assign r to h\ { h “proposes” to r} 
assigned{r) := true; 
if r is multiply assigned and 

r is indifferent between his provisional assignees then 
for each hospital h' at the tail of r’s list loop 
if r is provisionally assigned to h' then 
break the assignment; 

end if; 

delete the pair (r, h')\ 

end loop; 
else 

for each strict successor h' of h on r’s list loop 
if r is provisionally assigned to h' then 
break the assignment; 

end if; 

delete the pair (r, h')\ 

end loop; 
end if; 
end loop; 
end loop; 

if (some resident r is not assigned and assigned (r)) or 
some hospital is oversubscribed then 
no super-stable matching exists; 
else 

the assignment relation is a super-stable matching; 

end if; 



Fig. 2. Algorithm HRT-Super-Hosp. 



In order to establish the correctness of Algorithm HRT-Super-Hosp, a number 
of lemmas follow. We omit the proofs, which use similar techniques to those of 
Section 0 We begin by stating the analogues of Lemmas 01 and El for Algorithm 
HRT-Super-Hosp. In what follows, I is an instance of HRT, in which TZ is the 
set of residents and TL is the set of hospitals. 

Lemma 7. If, at the termination of the while loop of Algorithm HRT-Super- 
Hosp, the algorithm reports that the assignment relation M is a super-stable 
matching, then M is indeed a super-stable matching. 
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Lemma 8. No super-stable pair is ever deleted during an exeeution of Algorithm 
HRT-Super-Hosp. 

The next two lemmas deal with the case that Algorithm HRT-Super-Hosp claims 
the non-existence of a super-stable matching. 

Lemma 9. If, at the termination of the while loop of Algorithm HRT-Super- 
Hosp, some hospital is oversubscribed, then I admits no super-stable matching. 



Lemma 10. If some resident r became assigned during the while loop of Algo- 
rithm HRT-Super-Hosp, and r subsequently ends up unassigned at the termina- 
tion of the while loop, then I admits no super-stable matching. 

Together, Lemmas rTHTTIlestablish the correctness of Algorithm HRT-Super-Hosp. 
In addition, Lemma 0 implies that there is an optimality property for the as- 
signees of a given fully-subscribed hospital in any super-stable matching output 
by the algorithm. In particular, we have proved: 

Theorem 3. For a given instance of HRT, Algorithm HRT-Super-Hosp deter- 
mines whether or not a super-stable matching exists. If such a matching does 
exist, all possible executions of the algorithm find one in which every hospital 
h G TL is assigned either its q{h) best super-stable partners, or a set of fewer 
than q{h) residents; in the latter case, no other resident is assigned to h in any 
super-stable matching. 

As is the case for Algorithm HRT-Super-Res, by considering suitable data struc- 
tures, Algorithm HRT-Super-Hosp can be implemented to run in 0{mn) time 
and space, where m = I'Hj and n = |72.|. Again, the time bound follows by noting 
that the number of iterations of the while loop is bounded by the number of 
deletions from the preference lists. Note that the complexity of Algorithm HRT- 
Super-Hosp can also be expressed in terms of L, the total length of all preference 
lists in the HRT instance: clearly the running time is then 0{L). As is the case 
for Algorithm HRT-Super-Res, this time bound is optimal. 



4 Existence of Super-stable Matchings 

Algorithm HRT-Super-Res has been implemented and some preliminary experi- 
ments have been carried out, in order to give an indication of the likelihood of a 
super-stable matching existing in a given HRT instance. There are clearly several 
parameters that can be varied in these tests, such as the numbers of residents 
and hospitals, the capacities of the hospitals, the lengths of the preference lists, 
and the number, position and sizes of the ties. A range of vectors of values for 
the aforementioned parameters were considered, and for each vector, a set of 
random instances was created, each satisfying the particular constraints on the 
instance. Finally, the percentage of instances in each set admitting a super-stable 
matching was computed. 
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Perhaps not surprisingly, the empirical results suggest that the probability of 
a super-stable matching existing decreases as the size of the instance increases, 
and also decreases as the number and length of the ties increase. However, it 
was found that the probability of a super-stable matching existing is likely to 
be much higher if the ties occur on one side only, for example in the hospitals’ 
lists and not in the residents’ lists (further details may be found in ^3)- This 
is a situation that is likely to occur naturally in practice: for example, in the 
context of resident/hospital matching schemes, residents are typically asked to 
rank a relatively small number of hospitals, and might find it easier to produce a 
strictly ordered preference list than would a large hospital with many applicants. 

Due to the large number of different parameters that can be varied in em- 
pirical tests, clearly such experiments cannot hope to provide a comprehensive 
analysis of the likelihood of a super-stable matching existing in an arbitrary HRT 
instance. It remains open to establish theoretical bounds on the probability of a 
super-stable matching existing in a given random instance of HRT. 

5 Concluding Remarks 

In this paper we have highlighted the importance of the super-stability criterion 
in HRT, with reference to large-scale matching schemes. Current practice in 
the SPA scheme, for example, is that hospitals are permitted to express ties in 
their preference lists. However, any ties are broken arbitrarily so as to give an 
instance with strictly ordered lists. Hence by Proposition Q the SPA scheme will 
produce matchings that can only guarantee to be weakly stable in the original 
instance. We suggest that such centralised matching schemes should first search 
for a super-stable matching using Algorithm HRT-Super-Res, and only if none 
exists should they settle for a weakly stable matching. 

We finish with an open problem. A third stability criterion, so-called strong 
stability, can be applied to an HRT instance 0. In the strong stability case, the 
definition of a blocking pair is similar to that of the super-stability case, except 
that at most one agent in the pair is permitted to express indifference between 
the other agent and its (possibly worst) partner(s) in the matching. Clearly a 
super-stable matching is strongly stable, and a strongly stable matching is weakly 
stable. Additionally, the strong stability and super-stability definitions coincide 
if the ties belong to the preference lists of one set of agents only. As is the case 
for super-stability, a given instance of HRT may not admit a strongly stable 
matching (see ^ for further details). However, there is an 0(n‘^) algorithm, 
due to Irving jSj, for determining whether a given instance of SMT admits a 
strongly stable matching, and for constructing one if it does, where n is the 
number of men and women. An extended version of this algorithm, also of 0{n'^) 
complexity, has been formulated by Manlove for SMTI (the variant of SMT in 
which preference lists may be incomplete) |0|. We leave open the problem of 
constructing a polynomial-time algorithm, or establishing NP-completeness, for 
HRT under strong stability. 
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Abstract. Two vertices of an undirected graph are called fe-edge-conn- 
ected if there exist k edge-disjoint paths between them. The equivalence 
classes of this relation are called fc-edge-connected classes, or fc-classes 
for short. This paper shows how to check whether two vertices belong 
to the same 5-class of an arbitrary connected graph that is undergoing 
edge insertions. For this purpose we suggest (i) a full description of the 
4-cuts of an arbitrary graph and (ii) a representation of the fc-classes, 
1 < fc < 5, of size linear in n — the number of vertices of the graph; these 
representations can be constructed in a polynomial time. Using them, we 
suggest an algorithm for incremental maintenance of the 5-classes. The 
total time for a sequence of m Edge- Insert updates and q Same-5 -Class? 
queries is 0{q m -\- n ■ log^n); the worst-case time per query is 0(1). 



1 Introduction 



Connectivity is a fundamental property of graphs which is used in network reli- 
ability analysis, in network design problems, and other applications. In 1990’s, 
dynamic maintenance and augmentation of high connectivity has become an im- 
portant area of research (see, e.g., |1I2I7HI9I12I15I1H20| ). One of the directions 
concerns 1-, 2-, . . . , fc-connectivity in an arbitrary graph. As for motivation, de- 
signers of communication networks are usually interested now in analysis and 
maintenance for small values of fc, even 1 or 2, since networks of higher connec- 
tivity are too expensive. Theorists, on their side, try to extend their techniques 
as far as possible to be ahead of today needs, as usual. However, complexity of 
graph structures grows tremendously with k growing. In this paper we show that 
the case k = 5 still admits a compact description and a more or less efficient 
incremental algorithm. However, as far as we see, this is due to some lucky com- 
bination of existing methods, while the approach is not like to be extendible to 
greater connectivities. 

Let G = (V,E) be an undirected connected multi-graph without loops. A 
minimal edge-cut C of G (cut, for short) is an edge set whose removal disconnects 
G and removal of any proper part of G does not disconnect G. If |G| = k then 
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C is called a fc-cut. Two vertices {u,v} are called fc-edge-connected if no fc'-cut, 
k' < k, separates u from v. It is well known that the property “there exist k 
edge-disjoint paths between u and v in G” defines the same relation (see m)- 
The equivalence classes of this relation are called the A:-edge-connected classes {k- 
classes, for short). The partition of V into the (fc-l-l)-classes is a refinement of the 
partition of V into A:-classes. Thus, the connectivity classes have an hierarchical 
structure. 

In this paper we are concerned with the problem of maintaining the fc-classes 
of G under edge insertions, i.e., incremental maintenance. The main tool for 
solving such a problem is a certain abstract model that describes the graph 
connectivity structure in a way that enables to decide efficiently how this struc- 
ture changes when the graph is modified. Such a model must represent not only 
the connectivity classes but also the system of all minimal cuts that form these 
classes (for example, the Gomory-Hu tree 1111 . which presents a bounded sub- 
system of such cuts, cannot serve for efficient incremental maintenance). 

Efficient algorithms for the problem of incremental maintenance of the 1-, 
2-, 3- and 4-classes are known Westbrook and Tarjan m used the 

bridge-tree of a graph to handle its 2-classes. Galil and Italiano P2| and, inde- 
pendently, La Poutre, Leeuwen and Overmars PSl used the “cycle-tree” model 
of a 2-connected graph to describe and maintain the 3-classes of a connected 
graph. The well known cactus tree model 0 represents the (A -I- l)-classes of an 
arbitrary A-connected graph and is used for their incremental maintenance ^ 
(the bridge-tree and the cycle-tree are, in fact, special cases of this model.) In 
the 2-level cactus tree model was introduced. It generalizes the cactus tree 
model to represent the (A -I- 1)- and (A-|-2)-classes of any A-connected graph and 
serves for their incremental maintenance. 

Paper |2j uses the cactus tree model to maintain the 1-, 2-, 3- and 4-classes. 
The main innovation used is a special kind of a graph object: the 3-component 
A corresponding to a 3-class A afl The graph A has A as the vertex set and 
mimics the connectivity structure of A in a localized fashion: it contains all edges 
of G between vertices in A, and also an edge between each pair of vertices in A 
that are connected by a path that travels entirely through vertices outside A. 
The 3-component is 3-connected and its cactus tree model is a tree; this tree 
represents all 4-classes of G contained in A. In 0, the problem of incremental 
maintenance of the 1-, 2-, 3-, and 4-classes is hierarchically decomposed into 
subproblems on ^'-components of G, 1 < fc' < 3. 

In general, there are several difficulties in the incremental maintenance of 
the connectivity classes (see Figured for illustration). The insertion of an edge 
e = (u, v) into G can affect the structure associated with a fc'-class that contains 
neither u nor v. Furthermore, the changes in such a structure are not necessarily 
as simple as those caused by the insertion of an edge between two vertices in A. 



In the literature classes of fc-edge-connectivity are sometimes called fc-edge-connected 
components. Following the common tradition concerning 1- and 2-components and 
|1 SI1 4l4lfl] concerning 3-components, we use the term “component” for a graph re- 
lated to such a class. 
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However, paper jOj shows that the changes for each individual fc'-class, 1 < fc' < 
4, can be performed with only minimal knowledge of the connectivity structures 
of the rest of G. When several fc-classes merge, the model for the joint /c-class is 
be constructed, based on models of the constituting classes. 





Fig. 1. Changes in the connectivity classes resulting an edge insertion. (A dashed line 
encircles one 3-class and a dotted line one 4-class, in each graph.) 



In fact, the new light of jOI and jjj is crucial for analysis of dynamics of 5- 
connectivity. Our work is the construction of entire building on the cornerstones 
worked out in these papers. Our main contribution made during this construction 
is as follows. 

Concerning statics, we suggest, for the first time, a full description of the 

4- cuts of an arbitrary graph, thus filling the gap between descriptions of |Zj and 

0 - 

Generalizing the approach of | 01 , we associate the 2-level cactus tree model, 
instead of the cactus model, with each 3-component A of G, to describe the 4- and 

5- classes contained in A. We extend the localized transformation of cactus tree 
models suggested in j0| to 2-level cactus tree models, based on the incremental 
maintenance algorithm [Z]. Dynamics are much more complicated in our case 
than in |Hj, since the 2-level cactus tree model is substantially more complicated 
than the cactus tree model. 

The (amortized) complexities of the incremental algorithms for 1, 2 and 3- 
classes depend on the number of updates as the a-function, while the algorithm 
0 has an 0(n log n) term. Our algorithm has an O(nlog^n) term instead of 
it. To achieve this bound, we suggest a new technique that effectively handles a 
wide spectrum of dynamic forest operations including both arbitrary tree linking 
and breaking-into-two the cycle order of the children of a tree vertex (for the 
first time, to the best of our knowledge). The sizes of all above models are linear 
in \V\. 

This paper is organized as follows. Section 0 brings basic definitions and 
notations. Section^ presents the static description of our model. Section 0] deals 
with the model dynamics; in particular, in Section 1121 a general example is 
given. Section 0 describes the main ideas of the implementation. 
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2 Preliminaries and Notations 

Let G = {V, E) be an undirected connected multi-graph without loops, where 
\V\ = n>2 and \E\ = m. In this paper we refer only to edge-cuts, so the prefix 
“edge-” is omitted from now on. We refer to “minimal cuts” as simply “cuts”, 
unless otherwise is mentioned. 1-cuts are referred as bridges. The family of all 
fc-cuts of G is denoted by E^. 

Let X,Y C V, the set of edges with one end-vertex in X and the other in 
Y is denoted by S{X,Y); obviously, S{X,Y) = <5(F, df). We also denote \ X 
by X. Any cut G corresponds to the unique 2-partition (X,X) of V such that 
G = 5{X,X) Q. So we can refer to a cut by the 2-partition defining it. 

We say that a cut C = S{X,X) divides S, S CV (or that G is an S'-cut), 
if both X n S' and X fl S are nonempty. We say that a cut divides a subgraph 
if it divides its vertex set. A subset S of V , |S| > 2, is fc-connected if there 
exist in G fc edge-disjoint paths between every two vertices in S, that is, there 
are no S-cuts of cardinality less than k in G. The connectivity A(S) of S is 
defined to be the maximum k for which S is /c-connected, or in other words, 
the connectivity of S is the minimum number of edges in an S-cut in G. The 
connectivity of G is defined to be A(y), denoted for short by A. 

For any SCI/, the induced subgraph G(S) consists of the vertices in S 
and edges in E connecting vertices in S. To shrink a subset of vertices S C V 
means to replace G(S) by a single new vertex s, and, for every edge with one 
end-vertex in S, to replace this end-vertex by s; any edge of a new graph is 
identified with its corresponding edge of G. Let G = {vi,V 2 , ■ ■ ■ ,Vr,vi), r > 2, 
be a cycle. To squeeze a cycle at Vi and Uj, i < j, is to shrink the set {vi,Vj} 
to a new node v. This operation creates two new cycles from the old cycle G: 
(u, Vj+i, . . . ,Vr,vi, . . . ,v) and (u, Vi+i, . . . , Vj-i,v). Each of the new cycles may 
degenerate to the vertex v. To contract an edge e = (u, v) means to shrink the 
set {u, u}. To break an edge e = (u, v) by a vertex x means to add a new vertex 
X to G, and to replace the edge e by two new edges {u, x) and (x, v). 

For a given partition V of V, the related quotient graph is defined to 
be the result of shrinking each part of V into a single vertex. The quotient 
mapping f-p, defined on V, takes any vertex in such a part W to the vertex 
given by shrinking W. We denote by Q^{G) the quotient graph of G generated 
by shrinking each fc-class, and by the corresponding quotient mapping. It is 
easy to show that for any 1 < k' < k — 1, provides a bijection between the k'- 
cuts of Q^{G) and the /c'-cuts of G. The bridge-tree of a connected graph G is 
Q^(G), see |l5f21)j . A bridge-path is an ordered sequence of bridges of a graph 
which forms a path in its bridge-tree. Similarly, the cycle-tree of a 2-connected 
graph is Q^(G) (see ;i2pi5|). A cycle- path in a cycle-tree is an ordered sequence 
of its cycles such that any two consequent cycles have (a single) common vertex. 

For a family F of cuts of G, the equivalence classes of the relation “{x,y} 
is not divided by any cut in F” are called F-atoms. A cut model for G 
and a family F of cuts of G, is a triple (5,'0,iF) as follows. The connected 
graph Q = (V,£i) is called the structure graph. We refer to its vertices as 
nodes. The mapping -(/i : E — >■ V is called the structure mapping. For every 
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node J\f of Q, is either an i^-atom or the empty set. If = 0 

we say that Af is an empty node. We say that a cut C = S(X, X) of G 4’~ 
induces a cut = d{ip~^{X),tjj~^{X)) of G if both 'ip~^{X) and 'ip~^{X) 

are nonempty. .7^ is a family of cuts of G, called the modeling family, such that 
= F. Observe that, for any cut model, shrinking a subset of nodes of 
G implies naturally a new model: its mapping is the composition of the original 
mapping and the quotient one. 

The bridge-tree mentioned above is a simple example of a cut model. The 
set of its 1-cuts represents bijectively the set of 1-cuts of G. Another example 
of a cut model, for a 2-connected graph, is the cycle-tree mentioned above. The 
set of its 2-cuts represents bijectively the set of 2-cuts of G. 

The cactus tree model is a cut model for the family of minimum cuts 
of a graph and the (A -I- l)-classes which V is ’’cut into” by these cuts. The 
cactus tree model is defined by the triple as follows. The structure 

graph H — {V-h,£h) is a tree-of-edges-and-cycles graph. It is a connected graph 
s.t. every edge belongs to at most one cycle (in other words, every block of % is 
an edge or a cycle); such a graph is called a cactus tree. In the case A is odd, 
T-L is cycle-free, meaning, it is a tree. For every node J\f of "H, is either 

a (A -I- l)-class of G or the empty set. The modeling family T is the family of 
minimal cuts of "H, and = F^ . The number of edges in H. is linear in the 

number of (A -I- l)-classes of G, i.e., is 0(n). For an algorithm of construction of 
the cactus tree model see fD! (its complexity is 0(m -I- }?n ■ log(TO/n))). 

3 Model Description 

In order to construct a model for representation and incremental maintenance of 
the fc-classes of a graph, 1 < A: < 5, we combine two known models. The first one, 
suggested by Dinitz and Westbrook in Pj, represents the fc-classes of a graph, 
1 < A; < 4, and serves to maintain this representation under edge insertions. The 
second one, the 2-level cactus tree mode of Dinitz and Nutov jZj, serves the same 
purpose for 4 < A: < 5, for a 3-connected graph. 

In this section we follow |4pt)l7j . The definition of the structure is done via 
decomposition of the graph into auxiliary graphs, called components. The model 
has an hierarchic structure which uses the above mentioned models: the bridge- 
tree, the cycle-tree, the cactus tree and the 2-level cactus-tree. Consider a con- 
nected graph G. Its associated bridge-tree provides a complete description of 
the 2-classes in G. The 2-component associated with a 2-class, A, is the in- 
duced graph A = G{A). The cycle-tree model of each 2-component is used to 
describe its 3-classes. By PITC| . the collection of all cycle-tree models provides 
a complete description of the 3-classes in G. 

Let S' be a 3-class contained in a 2-class FI . Consider the cycle-tree of FA, 
Q^{H). Let L be a cycle of Q^{H) incident to S and let {u,vi) and (w,V2), 
vi,V2 & S, be the edges of G incident to S on L (see Figure EJ. The vertices 
vi and V2 are called the attachment vertices. In the case they are distinct, the 
virtual edge es{L) is defined as (vi,V2)- The 3-component S associated with 
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S is defined as the induced graph G{S) together with the virtual edges defined 
by all such cycles. The 3-component S mimics the connectivity structure of S in 
a localized fashion in the following sense. 





Fig. 2. The 3-component. 



Theorem 1 ([ 3 *]). The S-component S has the following properties: 

(i) It is ^-connected; 

(ii) For each k-cut C of G dividing S, k > 3, there is a k' -cut G , k' < k, of S 
dividing S in the same way; 

(iii) For each k-cut G of S, k >3, there is a nonempty set {’’hunch”) of k -cuts 
of G dividing S in the same way as C does. 

It is easy to deduce from this theorem that the fc-classes of G, k > 3, contained 
in S are exactly the fc-classes of S. Hence, all we need for the description is an 
appropriate model for 4- and 5-classes of a 3-connected graph. Indeed, then the 
collection of such models for all 3-components provides a full description of 4- 
and 5-classes of G. Paper 0 uses cactus tree models of 3-components to describe 
and maintain the 4-classes. For the 5-classes we use, instead of it, its extension: 
the 2-level cactus tree model |2| (the case A = 3). Such a model, for a graph, is 
as follows. 

Theorem 2 ([ 7 ]). In the case A = 3, for F^ U F^ there exists a cut model 
{TL^ , , !F^) of size 0{n), with the following properties: 

(i) The structural graph TL^ is a tree of edges, cycles, and cube graphs, such that 
each node of each cube graph is empty and is incident to exactly one bridge; 

(ii) The modeling family consists of: 

• the 1-cuts (bridges); 

• the minimal 2-cuts which are all pairs of edges of any block of TL^ that 
is a cycle. 

• for any block of TL^ that is a cube graph, the three cuts consisting each 
of four its pairwise nonadjacent edges; 
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• the non-minimal 2-cuts ofH^ of the form {{e', e"} : e', e" G P,P £ P}, 
where P is a certain set of bridge-paths ofH^, such that any two of them 
have at most one edge in common. 

(iii) The mapping takes the set of 1- cuts hijectively onto and the set 

of other cuts in onto . 

Let us get more information from paper j^. If we shrink each 2-class of the 
2-level cactus tree TL^ into a single node, we obtain a model which is isomorphic 
to the cactus tree model of G. It is a tree since A = 3 is odd; let us denote it by 
P. For e = (u,u) G E, P{e) denotes the bridge-path between <p{u) and (/?(u) in 
P. The set of the bridge-paths P consists of the paths P{e) for all edges e such 
that |P(e)| contains at least two edges. 

Paper |7j provides an efficient algorithm for incremental maintenance of 5- 
classes of a 3-connected graph. However, the case of a general graph is more 
complicated: in particular, an edge insertion causes, in general, merging of 3- 
classes; thus, we need an algorithm for merging 2-level cactus models. 

For the general case, we suggest the following full description of 4-cuts of an 
arbitrary graph G (see Figure |3 for illustration). 




Fig. 3. Example for 4-cuts description. 



Theorem 3 (f5|). For an arbitrary graph, G, the set of its i-cuts consists of 
the following two sub-families: 

(i) The 4-cuts dividing a single 3-class. This sub-family is decomposed into 
bunches corresponding to the A-cuts C of S, for all 3-components S. Any 
such bunch is the result of all possible independent substitution in G of every 
virtual edge es{L) by any edge from the cycle L. 
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(ii) The 4-cuts dividing exactly two 3-classes. Such classes are 3-classes Si and 
S 2 incident to the same cycle L ofQ^{G), each by two distinct attachment 
vertices, such that there exist 3-cuts C\ and C 2 of the 3-components Si and 
S 2 , respectively, containing the virtual edges esi(L) and es^iL), respectively. 
This sub-family is decomposed into bunches corresponding to the above pairs 
of 3-cuts. Any such bunch is the result of all possible independent substitution 
in Cl U C2 \ {esj (L), es2 (L)} of every virtual edge es(L'), L ^ L' by any 
edge from the cycle L' . 

Sketch of proof. The proof for the case (i) is based on Theorem [0 
Assume a 4-cut C divides some two 3-classes Si and S 2 • Then there exists a 
2-cut C separating Si from S 2 . Let us “divide” G into two auxiliary graphs Gi, 
G 2 as in Figure 0 Then C, together with an edge e G C', generates two cuts: 
an r-cut Ci of G and of Gi and a q-cut C2 of G and of G2, where Ci U C2 = C, 
Cl n C2 = {e}; hence, r -|- <7 = 6. Since both cuts divide 3-classes, holds r,q>3; 
hence, r = q = 3. By 0, any 3-cut of G containing an edge e in a 2-cut (i) 
divides a single 3-class and (ii) this 3-class belongs to the (unique) cycle of the 
cycle-tree model that e belongs to. Therefore, C divides exactly two 3-classes Si 
and S 2 , these 3-classes belong to the same cycle, L, of the cycle-tree, and cuts 
Cl and C2 contain the edges esi{L) and es2{L), respectively. Figure El presents 
an example, with the bunch of 4-cuts defined by C. ( Comment: notice that cuts 
Cl and C2, and thus also C, do not divide any 4-class.) 

4 Model Dynamics 

Let us now turn to the model dynamics under insertion of an edge into G. Mean- 
ing, we support updates Insert-Edge{u, v) where u,v € V. Note that as a result 
of an edge insertion, the only possible kind of modification in the connectivity 
classes is that a subset of existing /c-classes merges into a single fc-class. For 
proofs of the statements given below see m- 

Theorem 4. Let S Q V be a k-class. The insertion of an edge between two 
vertices in S can increase connectivity between vertices which belong to S only. 

Following m. we perform separately changes in each connectivity level. In 
each level we translate the changes to several “local” changes. The changes are 
done on single components, not on the entire G, and then the results in each 
connectivity level are combined. 

Paper Pj provides the incremental maintenance of the fc-classes, 1 < fc < 4, 
of a connected graph. Paper Pj provides the incremental maintenance of the 
5-classes of a 3-connected graph. Let us see what extension is needed for the 
incremental maintenance of the 5-classes of a general connected graph. We dis- 
tinguish the following three cases of insertion of a new edge: the vertices u and 
V are 3-, 2- or only 1-connected. 

In the first case, let u, v belong to some 3-class S. By theorem E] no model 
changes, except for the 2-level cactus-tree model of S. The algorithm of trans- 
formation of a 2-level cactus-tree model is given in |7]. 
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In the third case, all three connectivity levels are influenced. By Q, the entire 
transformation is reduced to a certain transformation of the bridge-tree and to 
separate transformations for certain “involved” 2-components. Each one of the 
latter transformations is exactly the same as if a certain edge is inserted into the 

2- component 0. Thus we have a reduction to the second case. 

The second case includes most of our work. In this case we insert a new edge 
e between two vertices u and v in the same 2-class H of G but v G S, u G T, 
where S,T are distinct 3-classes contained in H. By Theorem 0 this insertion 
does not increase connectivity between vertices which do not belong to H. In 
the cycle-tree Q^{G{H)), each node represents a 3-class. The involved 3-classes 
are S,T and all 3-classes that separate between them in the cycle-tree. In this 
case we change two levels of connectivity. At the first level, we correct the 2- 
component H and its model according to 0. At the second level we correct 
the involved 3-components and their models (by Lemma 16], non-involved 

3- components do not change resulting such an edge insertion.) 

Paper uses, as intermediate objects, results of certain T-transformations 
of the 3-components S and T and their related models, and certain H-trans- 
formations of the other involved 3-components and their related models (for 
illustration see Figure^. The T-transformation is caused by breaking a certain 
edge 6u by a new vertex u and adding the new edge The H-transformation 

is caused by breaking certain edges e„ and by new vertices u and v, respec- 
tively, and adding the new edge (v,u). 




Fig. 4. The T- and H-transformations. 



Paper provides procedures for updating the cactus tree model of the in- 
volved 3-components in T- and H-transformations. We generalize these transfor- 
mations to a 2-level cactus tree model. Paper 0 also provides a procedure for 
merging the involved 3-components and their cactus tree models. We generalize 
this procedure to merge 2-level cactus-tree models. Our procedures are much 
more complicated than in |3 , since the 2-level cactus tree model is substantially 
more complicated than the cactus tree model (see Theorem 0 • 

In this extended abstract we describe only T-transformation. The proof of 
its correctness, as well as descriptions and proofs for H-transformation and for 
the merge procedure can be found in m- 
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4.1 T-transformation 

Let us see what changes occur in the 3,4-cuts of S as the result of a T-transform- 
ation, where = (x,y) (see Figure EJ. Consider 3-cuts. First, a new 3-cut that 
separates u from the rest of S is added. Every 3-cut Ci = {A, B) such that x € A 
and y,v G B creates two cuts. The first is the 3-cut C[ = (A, B U {it}) which 
contains edge {x, u) instead of edge e„. The second is 4-cut C" = (A U {u}, B) 
which contains edges (u,y) and (v,u) instead of edge e^. Each 3-cut which 
separates v from both x and y gets the new edge {v,u) and becomes a 4-cut. 
Consider 4-cuts. Every 4-cut = {A, B) such that x G A and y,v G B creates 
4-cut C'2 = {A,B U {«}) which contains the edge {x,u) instead of the edge 
e„. Every 4-cut which separates v from both x and y gets the new edge (v,u), 
becomes a 5-cut and goes out of consideration. 





Fig. 5. Behavior of 3, 4-cuts under a T-transformation. 



Let us describe the transformed 2-level cactus tree model. In this paper we 
consider only the case where no block of the model is a cube graph (the general 
case is considered in m ). We use the following notation (for illustration see Fig- 
ure 0. The path-of-edges-and-cycles between (p‘^{x) and (p'^{y) in is denoted 
by P{eu)- Its bridges form, in general, a path which belongs to 7T, denoted by 
P(eu)- The shortest path of bridges and cycle-edges between y}^{x) and y}^{y) 
is denoted by P(e„). We denote the node or cycle which belongs to P{eu) and 
is the nearest to by Z^- We define P^v to be the path of edges-and-cycles 

between and yP'{v). The bridge which is on the path between <{>^{x) (resp., 
^p^{y)) and Zu and is the nearest to Z^ (if exists) is denoted by Cx (resp., Cy). 
Note that Cx,ey G P(eu)- By [3 Fact 5.1], in the case is a cycle, Z^ has 
exactly one common cycle-edge with P(eu); we denote this cycle-edge by ez^- 
Consider a path P of 7T which intersects both P(eu) and P^v The sub-path 
(P n (P{eu) U Puv)) is called the intersection cycle-generating sub-path 
(ICGS-path, for short) defined by P. In general, we have four cases. Zu can be 
a node or a cycle. In each case we have two possibilities: there exist or do not 
exist ICGS-paths. Let us learn more about the ICGS-paths. 

Lemma 1. There exist at most two distinct ICGS-paths, and if there exist two, 
then at least one of them has exactly two bridges. 
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Fig. 6. Example of T-transformation in the case is a cycle and there exists an 
ICGS-path: (a) the original model; (b) the transformed model. (Two-arrowed lines 
mark paths in 77.) 



We use the following notation. By definition, each ICGS-path includes exactly 
one of {ex, By}. We assume, w.l.o.g., that if there exists a single ICGS-path, it 
contains e^; we denote it by P^- In the case there are two ICGS-paths, we denote 
by Py someone which has exactly two bridges. By Lemma 0 we have six cases: 
Zu is a node and there exist either zero or one or two ICGS-paths, or Zy_ is a cycle 
and there exist either zero or one or two ICGS-paths. The following statements 
show that the third option can be reduced to the second one (so it is not taken 
care of below), and that the sixth option cannot take place. 

Lemma 2. If is a node and there exist two distinct ICGS-paths, then the 
4-cut of S which is -induced by the single non-minimal 2-cut defined by Py is 
represented twice in the model. 

Lemma 3. If Z^ is a cycle then there exists at most one ICGS-path. 

Following is the description of the T-transformation of the 2-level cactus tree 
model of S (for illustration see Figure |^. 

2-level- T-transformation(S, e„, p^{v)y. 

1. Find P{cu), Zu, Puv, Cx, and Px, 

2. If Tc exists then break Cx by a new empty node X\ 

3. Else if Zu is a cycle then break ez^ by a new empty node X\ 

Else [Px does not exist and Z^, is not a cycle) denote Z^ by X\ 

4. Perform Algorithm0(S, Y, (p^(n)); 

5. Add a new node denoted by Mu, and a new bridge (Mu, X); 

6. Replace path P(eu) in 77 by the two following paths: 

• The part of P(eu) between gP(x) and X (if not empty) plus (Mu,X); 

• The part of P(eu) between pf(y) and X (if not empty) plus (Mu,X)\ 

7. Return(A/’u, Y); 

The update of the structural mapping is as follows. For any vertex w G V such 
that 7 p(w) = Mw and the node Mw has been shrunk (with some other nodes) 
into a node M!^,, the new image of w is M^,. The image of the new node u is Mu. 
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4.2 General Example 

In Figure Q a general example is given. Part (a) presents the original graph 
G, with the new edge shown by a dotted line. In part (b), there is given the 
3-quotient of G, which has 4 nodes; the three nodes-nontrivial 3-classes are 
shown by shadowed circles and ellipse. Part (c) contains the three involved 3- 
components; the four virtual edges are the four vertical arcs in the middle of the 
figure. 




Fig. 7. General example. 



The 2-level cactus tree models for these 3-components are presented in part 
(d). The left and the middle models are, basically, the cactus tree models. In 
addition, there is one path in U in the left model, shown by the two-arrowed 
line; the corresponding edge of G is shown dashed. There and further, thicker 
structural edges belong to the cactus tree model of G, while thinner edges form 
cycles representing 4-cuts (there are four such cycles of length 2 each in the right 
model) . 

Parts (e-h) go upwards in the Figure, so that all the parts are placed in 
a cyclic order. In part (e), the updated involved 3-components are shown; the 
updates in the left and right models cause T-transformations, while the update 
of the middle model causes an H-transformation. Part (f) presents the 2-level 
cactus tree models of the updated 3-components; the left and right models result 
from the algorithm given in Section l4. II 
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In part (g), these models are merged into the 2-level cactus tree model of 
the united 3-component. This 3-component corresponds to the single nontriv- 
ial 3-class of the incremented graph C?; this 3-class is the union of the three 
involved 3-classes of G. Part (h) shows the two-node 3-quotient of the incre- 
mented graph. Inside the shadowed node, the new 3-component is shown; its 
single virtual edge — corresponding to the pair of edges going to the other (triv- 
ial) 3-component — is shown dashed. 

One more comment on indication of paths in 77. Two-arrowed lines are longer 
than necessary for pointing to all bridges in such a paths, they go along cycle 
edges of the model too. This is done to show — by the ends of such a line — to 
where are mapped the two end-vertices of the edge defining the path in 77, by 
the structural mapping. 



5 Implementation 

For implementation of the above transformations we extend the technique of 

which uses dynamic trees of mi- The main difficulty is that model graphs 
associated with 3-components are not trees, as in but cactus trees. As a 
result, the squeeze-cycle operation must be added. We implement as follows. 
First of all, a cycle L in a cactus tree is represented, as in jl2iq) . by a special 
dummy vertex with an edge from it to every vertex of L, plus the list of vertices 
of L in the cycle order. In this way a cactus tree is implemented as a tree. 

Suppose we have to squeeze a cycle L at its vertices x and y, resulting in two 
cycles Li and L 2 . Assume \Li\ < IL 2 I. We scan 7 from x in both directions step 
by step. In time 0(|Li|) we arrive at y, thus finding Li. Using 0{\Li\) dynamic 
tree operations, we pull all vertices of L\ out of L forming two separate cycles 
Li and L 2 with a single common vertex, instead of L, as required. 

In order to bound time, we amortize the above dynamic tree operations on 
vertex histories. Since \Li\ is at most |7|/2 above, any vertex changes the cycle 
it belongs to at most logn times. Since a dynamic tree operation costs O(logn), 
the total time sums to 0(n log^ n). 

Theorem 5. The 5-classes of an arbitrary graph can be maintained under any 
sequence of q Same-5 -Class{u,v)? queries and m Insert-Edge{u,v) updates in 
total time 0{q -\- m-\- n ■ log^n). The worst-case time per query is 0(1). 
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Abstract. Given an undirected graph G = {V, E) and a positive integer 
k, we consider the problem of augmenting G by the smallest number of 
new edges to obtain a fc-vertex-connected graph. In this paper, we show 
that, for fc > 4 and k > i+2, an Avertex-connected graph G can be made 
fe-vertex-connected by adding at most — 1) -|-max{0, (<5— 1)(£ — 3) — 1} 
surplus edges over the optimum in 0{S{k^n^ -|- time, where 

5 = k — £ and n = \ V\. 

1 Introduction 

The problem of augmenting a graph by adding the smallest number of new edges 
to meet vertex-connectivity requirements has been extensively studied as an important 
subject in the network design problem |3], the data security problem, the graph 
drawing problem m and others, and many efficient algorithms have been developed 
so far. 

Given an undirected graph G = (U, E) and a positive integer k, we consider the 
problem of augmenting G by the smallest number of new edges to obtain a fc-vertex- 
connected (fc-connected, for short) graph. We call this problem the k -vertex- connectivity 
augmentation problem {k-VCAP, for short). Currently it is known that fc-VCAP for 
k G {2,3,4} can be solved in polynomial time ( [2I7| . [fill 7j and 0 for fc = 2, 3 and 4, 
respectively), where an initial graph G may not be {k — l)-connected. For an arbitrary 
integer k > 0, whether fc-VCAP is polynomially solvable or not is still an open question 
(even if an initial graph is restricted to be (fc — l)-connected). When an initial graph 
is (fc — l)-connected, Jordan presented an 0(n®) time approximation algorithm for fc- 
VCAP with a general fc pm| . The difference between the number of new edges added 
by his algorithm and the optimal value is at most (fc — 2)/2. 

However, it was an open question whether there exists a good approximation algo- 
rithm for fc-VCAP if an initial graph is not (fc — l)-connected. For arbitrary integers fc 
and S > 2, we consider whether there exists a polynomial time algorithm that makes a 
given (fc — J)-connected graph fc-connected by adding a set E' of new edges such that 
the difference between \E'\ and the optimal value opt is small, say \E'\ — opt = 0{Sk). 
One may apply Jordan’s algorithm 5 times to obtain a fc-connected graph. However, 
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there is an example such that the number of edges added by this procedure cannot be 
bounded by 0{5k) over the optimum. 

In this paper, for arbitrary fc > 4 and £ = k — 5, we consider the problem of 
augmenting an f-connected graph G by adding the smallest number of new edges in 
order to make G fe-connected. We first present a lower bound on the number of edges 
that is necessary to make a given graph G fc-connected, and then show that the lower 
bound plus <5(fc — 1) + max{0, {S — 1) (£ — 3) — 1} edges suffices. The task of constructing 
such set of new edges can be done in 0(S(k^n^ + time. 

The paper is organized as follows. In Section we state our main result that k- 
VCAP is approximable within the absolute error 0{Sk) for a. (k — <5)-connected graph, 
after introducing some basic notations and deriving two lower bounds on the optimal 
value of the problem. In Section |21 we describe an outline of our approximation al- 
gorithm, called V-AUGMENT, for fc-VGAP. In Section 0 we describe that the first 
lower bound can be computed in polynomial time. After stating several properties of 
fc-connected graphs in Section El we show in Section ^ some previously known and 
newly derived edge-splitting operations (which are procedures for replacing two edges 
with a single edge while preserving fc-connectivity). In Sections [7] and 0 we prove the 
correctness of V-AUGMENT. In Section El we state some concluding remarks. 



2 Main Theorem 

Let G = {V, E) stand for an undirected graph with a set V of vertices and a set E of 
edges, where we denote \V\ by n (or by n{G)) and \E\ by m (or by m{G)). An edge with 
end vertices u and v is denoted by (u, v). In G = {V, E), its vertex set V and edge set 
E may be denoted by V (G) and E(G), respectively. A singleton set {a;} may be simply 
written as x. For a subset V' GV (resp., E' C E) in G, G[U'] (resp., G[E'\) denotes 
the subgraph induced by V (resp., G[E'] = {V,E')). For V C V (resp., E' C E), 
we denote subgraph G[V - U'] (resp., G[E - E']) also by G - V' (resp., G - E'). For 
E' C E, we denote V (G[i5']) by V[E']. For an edge set E' with E' GE — we denote 
the augmented graph G = {V, E U E') by G -I- E' . For two disjoint subsets of vertices 
X, Y C V , we denote by Ec(X,Y) the set of edges e = (x,y) such that x € X and 
y GY, and also denote \Ea{X,Y)\ by cg{X,Y). In particular, Ec{u,v) is the set of 
edges with end vertices u and v. A partition X^, . . . , Xt of the vertex set V means a 
family of nonempty disjoint subsets of V whose union is V , and a subpartition of V 
means a partition of a subset V' of U. For a subset X of U, a vertex v G V — X is called 
a neighbor of X if it is adjacent to some vertex u G X, and the set of all neighbors 
of X is denoted by EoiX). A maximal connected subgraph G' in a graph G is called 
a component of G (for notational convenience, a component H may be represented by 
its vertex set X = V{H)), and denote the set of all components in G by C(G) and 
the number of components in G by p{G). A disconnecting set of G is dehned as a 
subset S' of V such that p(G — S) > p{G) holds and no S' C S has this property. The 
local vertex-connectivity Kc{x,y) for two vertices x,y G V is defined to be the number 
of internally-disjoint paths between x and y in G. By Monger’s theorem, Kc{x,y) for 
nonadjacent vertices x and y is equal to the minimum size of a disconnecting set that 
separates x and y. A component G' of G with |U(G')| > 3 always has a disconnecting 
set unless G is a complete graph Kn- For a connected G, a disconnecting set of the 
minimum size is called a minimum disconnecting set, and its size, denoted by k(G), 
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is called the vertex- connectivity of G; we define k(G) = 0 if G is not connected, and 
fv(G) = n — 1 if G is a complete graph Kn- A graph G is called fc-connected if k(G) > k. 
A subset T dV \s called tight if rc{T) is a minimum disconnecting set in G. A tight 
set D is called minimal if no proper subset D' of D is tight (hence a minimal tight set 
D induces a connected subgraph G[D]). We denote a family of all minimal tight sets in 
G by T>{G). Let t(G) be the maximum number of pairwise disjoint minimal tight sets 
in G, and let /3(G) = max{p(G — 5')|S is a minimum disconnecting set in G} (hence 
P{G)\ > t{G) > /3(G)). 

For an initial graph G and a fixed integer fc > 1, let optk{G) denote the optimal 
value of the fc-VCAP in G, i.e., the minimum size \E'\ of a set E' of new edges to 
obtain an fc-connected graph G + E' . Several algorithms have been developed for k- 
VCAP in the case where an initial graph G is {k — l)-connected. These algorithms use 
the following lower bound on opt^iG). If k(G) = k — 1, then we easily observe that 
M(G) = max{[t(G)/2] , /3(G) — 1} is a lower bound on optkiG). Eswaran and Tarjan 
|2| proved that 2-VCAP can be solved by finding a set of Ad(G) edges. Watanabe and 
Nakamura m stated the same result for 3-VCAP. Thus M{G) is indeed the optimal 
value for k(G) = fe — 1 and k = 2,3, while it is known that M{G) can be smaller than 
the optimal value for general fc > 4. It is reported in that 4-VCAP can be solved 
in polynomial time for an arbitrary initial graph G. For fc > 5, Jordan proved fUJ 
that fc-VCAP with k(G) = fc — 1 can be solved by an approximation algorithm which 
finds a solution with absolute error at most (fc — 2)/2. 

In what follows, we derive two types of lower bounds, a.k{G) and /3k{G) — 1, on 
optk{G), where k(G) is not necessarily fc — 1. 

We call a subset X C V dominating in G if F — A — Fg'(A) = 0, and non- dominating 
if F - A - rc(A) / 0. 

To make G fc-connected, it is necessary to add at least max{fc — |Fg(A)|, 0} edges 
between A and F — A — Eg(X) for any non-dominating set A C F. Given a family 
X = {Ai, . . . , Ap} of disjoint non-dominating sets, the total sum X/i=i p niax{fc — 
|FG(Ai)|,0} of “deficiencies” over X is decreased by at most two by adding one new 
edge to G. /.From this, we need at least [afc(G)/2] new edges to make G fc-connected, 
where 

ak{G)= max i ^ (fc - |Fg(A)|) i . (1) 

all families X of disjoint J 

non-dominating sets 

(Note that t(G) = ak{G) holds if k(G) = fc — 1.) 

We now consider another case in which new edges becomes necessary. For a vertex 
subset S C F of G with |5| = fc— 1, let Ti , . . . ,Tq denote all the components in G— S', 
where q = p(G — S). To make G fc-connected, a new edge set E' must be added to G so 
that all Ti form a single connected component in {G -\- E') — S. For this, it is necessary 
to add at least p{G — S) — 1 edges to connect all components in G — S, where S is not 
necessarily a minimum disconnecting set of G if «(G) < fc — 1. Here we define 

/3fe(G)= max Jp(G-S)i. (2) 

all S C F with |S1 = fc - 1 ( J 

Thus at least Pk{G) — 1 new edges are necessary to make G fc-connected. Define 
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The next lemma combines the above two lower bounds. 

Lemma 1. (Lower Bound) For a given graph G, it holds 'yk{G) < optk{G). □ 
In this paper, we prove the next result. 

Theorem 1. Let G be an (.-connected graph with £ > 0. Then, for any integers k > 4 
and 5 = k — £ > 2, it holds 

optk{G) < 7 fe(G) + 5{k — 1) + max{0, {6 — 1)(£ — 3) — 1}, 

and a feasible solution E' to k- VCAP with |i?^| < 'yk{G) + — 1) +max{0, {S—l){£ — 

3) — 1} can be found in 0{S{k^n^ + k^n^^“^)) time, where n = |L(G)|. □ 

3 Outline of Algorithm 

In this section, we give a sketch of our algorithm for Ending a set E' of new edges in 
Theorem d where the algorithm also plays a role proving the theorem. 



3.1 s-Basal fe- Connectivity 

A graph H with a designated vertex s G V (H), where H — s is denoted by G, is called 
s-basally k-connected if 

|/g(A’)| + |/V(s) C\ X\ > k for all non-dominating sets X C V{G) in G (3) 

with |rG(x)|-b|x| >k, 

\Eg{x) \ ch{s,x) > k for all singleton sets X = {*} C V{G). (4) 

Claim. For an s-basally fc-connected graph H, it holds 

|/g(A)| -b ch{s,X) > k for all non-dominating sets XgV in G = H — s. (5) 



3.2 Edge-Splitting Operation 

An edge-splitting operation is defined as follows. Given a graph H with a designated 
vertex s and vertices u,v £ Fh{s) (possibly u = v), we construct graph H' from H 
by deleting one edge from each of Eh{s,u) and Eh{s,v), and adding new one edge to 
Eh{u,v): Ch'{s,u) := ch{s,u) - 1, Ch'{s,v) := ch(s,v) - 1, Ch'{u,v) := ch{u,v) -b 1, 
and CH'{x,y) := CH{x,y) for all other pairs x,y £ V{H) — s. In the case of u = v, 
we interpret that ch'{s,u) := ch{s,u) — 2,ch'(u,u) := ch{u,u) -b 1, and CH'{x,y) := 
ch{x, y) for all other pairs x,y £V . We say that H' is obtained from H by splitting a 
pair of edges (s,u) and (s,v) (or by splitting (s,u) and (s,n)). Conversely, we say that 
H' is obtained from H by hooking up an edge {u,v) £ E{H — s) at s, if we construct 
H' by replacing an edge (u, v) with two edges (s, u) and (s, v) in H. 
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3.3 Entire Algorithm 

Given a graph G = {V, E), we try to start with computing the lower bound ak(G). For 
this, we add a new vertex s to G together with some new edges between s and V such 
that each non-dominating set A in G with EciX) < k receives at least k — EciX) edges 
between s and X, where multiple edges are allowed between s and a vertex v £ V . 
In the resulting graph H* , we then split edges at s so that the vertex-connectivity of 
H* — s increases one by one. During the algorithm, we may further add to H* — s some 
new edges (which are not generated by splitting at s). 

Algorithm V-AUGMENT 

Input: An undirected graph G = (V,E) and integers fc > 4 and i > 0 such that 
\V\ >k + l, k(G) = I and k — £ >2. 

Output: A set E' of new edges with I E' I < opts,(G)-|-(i(A:— l)-|-max{0, (d— 1)(£— 3) — 1} 
such that G* = G + E' is fc-connected, where 5 — k — £. 

Step I (Addition of vertex s and associated edges): Add a new vertex s to- 
gether with a set E* of edges between s and V such that the resulting graph 
H* = (E U {s}, EVJ F*) is s-basally fc-connected and F* is minimal subject to this 
property. 

Property 1. (1) The graph H* can be computed in 0(min{fc, ■^/n}kn^) time. 

(2) If |TW*(s)| < k, then there exists a set E' of at most S{k — 1) new edges 
such that G + E' is fc-connected, and such E' can be found in 0(min{fc, \/n} 
{{S — l)k^n + Skn^)) time. 

(3) If \EH-is)\ >k + l, then |E*1 = ak{G) holds. □ 

Based on this property, if |_Tff»(s)| < k, then we find an E' in Property 0^2) and 
halt; we proceed to Step II otherwise. 

Step II (Increasing the vertex-connectivity from I to k — 1): Let j := I and 

Hi ■- H*. 

For j = £, . . . ,k — 2, we repeat computing from Hj a graph iFj+i in the next 
property. 

Property 2. For an s-basally fe-connected graph Hj with K{Hj—s) = j, there exists 
an s-basally fe-connected graph Hj+i with k(JF,+i — s) = j -|- 1 such that iFj+i is 
constructed from Hj by splitting some edges incident to s and by adding a set Ej 
of at most max{2) — 2,j -f 1} new edges. Moreover, such ifj+i can be computed 
in 0(min{fe, ^/n\{k^n + kri^)) time. □ 

Thus, we obtain an s-basally fe-connected graph Hk-i ~ (EU{s}, EuFk_iUEk-iU 
Ek-i) with K{Hk-i — s) = fe — 1 and F^_-^ C F* , where is the set of edges 

generated by splitting edges in F* — Fk_i at s, and Et-i = E{Hk-i — s) — E — El_i 
with \Ek-i\ < max{2i — 2,i -|- 1} (hence Ek-i is the set of edges directly 

added to Hk-i). 

Step III (Increasing the vertex-connectivity from fe — 1 to fc): /_From Hk-i 
obtained in Step II, we compute an s-basally fe-connected graph Hk = (ILU{s}, E\J 
Fk U Ek LI Ek) in the next property. 
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Property 3. For an s-basally fc-connected graph Hk-i obtained in Step II, there 
exists an s-basally fc-connected graph Hk = {V U{s}, EVJ F^VJ E^. VJEk) with C 
F* and Ek = E{Hk — s) — E—E'^, where E^. is the set of edges generated by splitting 
edges in F* — F^ at s, such that < |_F* — Fk\/2 + y^^._'^ max|2i — 2, i-|- 1| 

and Gfe = G -I- {E^ U Ek) Hk — s) can be made fc-connected by adding a set Ek 
of at most /3(G) — \Ek U i?fc| — 1 or t{Gk) — 1(< max{2fc — 3, fe -I- 1} — 1) new edges. 
Moreover, such Hk and Ek can be found in 0(min{fc, ^Jn]{k^n + kn^)) time. □ 

Then we augment G to a fc-connected graph by adding with edge set E' = F/JUF/feU 
.Efe,where|F;'| < /3(G)-1 or |F/'| < |F’*-F’^*|/2+X;tr/ max{2i-2, i-f l}-ft(Gfc)-l 
holds. If \E'\ < /3(G) — 1(< optk{G)), then G+E' is an optimally augmented graph. 
If l^^'l < |F'*-F'fc*|/2-fX:-r"max{2i-2,i-f l}-ft(Gfe)-l, then by t(Gfe) < |F'fc*| 
and |F*| = qj,(G), we have \F* — Fk\/2 + max{2i — 2, i -f 1} -|- t{Gk) — 1 
^ \oik(G) /2\ [t(Gfe)/2j — 1 -f in8'x{2i — 2,i -|- 1} < optk(G) -I- fc — 2 -I- 

'ZZaZt max{2i — 2, i -|- 1} < optk{G) + 5{k — 1) -I- max{0, (5 — !){(. — 3) — 1}. □ 



By summing up the running time in Steps I, II and III (where we apply Property |2l 
at most 5 times), the entire time complexity of V-AUGMENT is 0(5min{A:, y'nKfcn^-l- 
k^n)) = 0{& {k'^rZ -f F^n®/^)). 

Remark: In general, it seems difficult to solve the maximization problem in 0 to 
compute /3fc(G) in polynomial time. However, from our algorithm, if [aj,(G)/2] +<5(fc — 
1) -|-max{0, (<5 — 1)(£ — 3 ) — 1} -I- 1 < j3k{G), then Pk{G) can be computed in polynomial 
time. □ 

4 Correctness of Step I 

In this section, we observe the correctness of Step 1. For this, it suffices to prove 
Property 0 

We say that a disconnecting set S G V disconnects two disjoint subsets Y and Y' 
of F — S' if no two vertices x G Y and y G Y' are connected in G — S. In particular, a 
disconnecting set S disconnects vertices x and y in V — S if x and y are contained in 
different components of G — S. A vertex subset X intersects another vertex subset Y 
if none of subsets X OY , X — Y and T — X is empty. The following property holds for 
two vertex subsets X and Y in G = (V, E): 

|rG(x)| + |rG(y)| >|rG(xny)| + |rG(xuy)|. (6) 



Proof of Property ITT 1 1 : We start with the graph H obtained from G by adding a 
new vertex s and max{l, k — |F'G(r’)|} edges between s and each vertex n G F. It is not 
difficult to see that H is s-basally F-connected. We then can check whether H — (s, v) 
remains s-basally F-connected or not for each vertex v G V with ch{s,v) = 1 > 
F — |FG(r>)| by computing Kif_(s^„)(s, n). Hence, the s-basal F-connectivity of H — (s,v) 
can be tested in 0{m + min{F, ^/n}kn) time by using the network flow computation 
PI on a sparse spanning subgraph of H with 0{kn) edges, where such sparsification 
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takes 0{m) time Since there are 0(n) such computation of kh-(s,v){s,v), the 

total time complexity for computing H* is 0(min{A:, -^/n}kn^). □ 

Proof of Property To prove this property, we use the next lemma. 

Lemma 2. Let C be a cycle in a k-connected graph G = {V, E) such that k(G — 
e) = k — 1 holds for every e G E{C). Then there exists a vertex v G V(C) with 
|Tg(u )1 = k. □ 

Assume that |/ir* (s)| < k. We start from Ge := G and continue to construct a (j + 1)- 
connected graph Gj+i from Gj by adding a set Ej of new edges in the following way 
until j = fc — 1 holds. 

For each j G {f, . . . , A:—!}, since t{Gj) < \Eh* (s)| < k holds, Gj can be made (i+1)- 
connected by adding a set Ej of new edges with \Ej\ < k — 1 from Lemma|21 Note that 
Gj+i ~ Gj + Ej satisfies nlGj+f) = j + 1 and t{Gj+i) < k since (G^+i U {s}) + F* 
remains s-basally fc-connected. 

Consequently, we can find a solution whose size is | Ei\ < {k — t){k — 

l)=(5(fc-l). □ 

Proof of Property ITT 3i : The minimality of F* implies that for each edge (s, u) G 
Eh* (s, V) = F* , there is a set C V with v G such that 

(i) \raiX^)\ = k - 1, CH*is,X^)=CH*{s,v) = l, and P - - Fg(X„) 7 A 0, or 

(a) |X„1 = 1 and |T’g(i’)| +Ch*{s,v) = k, 

and no proper subset X' C satisfies this property. For an integer i G {0, 1, . . . , k— 1}, 
we call a set TCP i-critical in H* , if T satisfies V — T — Ea{T) 0, |Fg(T)| = 
|Fg(T)| + \Eh*{s) n T| = fe, and ch*{s,u) = 1 for each u G Fif(s) Pi T, and no 
proper subset T' d T satisfies this property for the fixed i. Note that ch* (s, T) = 
\Eh* (s) n r| = k — i holds for an i-critical set T with i G {0, 1, . . . , fc — 1}. We call a 
singleton set T = {u} with v G V k-critical if |Fg(u)| -|-Cir* (s, v) = k, and ch* (s, u) > 0 
hold. Note that the above set satisfying (i) (resp., (ii)) is (k — l)-critical (resp., k- 
critical). Thus, 

Lemma 3. Each v G Eh* (s) is contained in a (k — l)-critieal set or a k-critical set. 

□ 



By using 0, we can prove the next property. 

Lemma 4. Assume that H* has an i-critical set Ti and a j-critical set Tj such that 
Ti and Tj intersect each other in G. If |Fif*(s)| > k -\- 1, then Ti U Tj eontains an 
{i j — h)-critical set T with Eh* (s) fl (T UTj) C T, where h = \Eg{T n Tj)| . □ 

Let T be a family of i-critical sets T C P, 0 < i < fe, such that Eh*{s) C 
holds and |T| is the minimum; Lemma^lsays that such T exists. Then we can observe 
by Lemma0that if |Ffr»(s)| > fc-l-1, then every two sets in T are pairwise disjoint. By 
\Eh* (s)| > A: -|- 1, for a minimum family T of i-critical sets with Eh* (s) C UtgtT, we 
have |F*| = ch*{s,V) = ^ “ I-Tg(T)I) < ak{G). Moreover, 

if |F*| < ak{G), then it is not difficult to see that at least one set X G T (or {x} G T) 
would violate 0 or 0. Hence |F*| > ctk{G) also holds. □ 
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5 Structure of fc-Connected Graphs 

Before proving the correctness of Steps II and III of V-AUGMENT, we review some 
properties of a fc-connected graph, which will be a basis for deriving edge-splitting 
operations in these steps. 

Lemma 5. UDI Let G be k-connected. If t(G) > k -\- 1, then any two minimal tight 
sets X,Y G H{G) are pairwise disjoint [i.e., t{G) = \T>{G)\). □ 

For a subset 5 C E in G, we call the components in G — S' the S-components. Note that 
the vertex set S is a disconnecting set in a connected G if and only if p(G — S) > 2. A 
tight set T is called a superleaf, if T contains exactly one minimal tight set D G L>{G) 
and no superset T' Z) T satisfies this property. The following lemmas summarize some 
properties of super leaves. 

Lemma 6 . pQ Let G = {V, E) be a connected graph with t(G) > k(G) -I- 3. 

(1) For every minimal tight set, as well as every superleaf, the induced subgraph is 
connected. 

(2) For each minimal tight set D G T>{G), there is a unique superleaf Q containing Di. 

(3) Every two superleaves are pairwise disjoint. Hence, a superleaf Q is disjoint from 
all other minimal tight sets in T>{G), except for the one in E{G) contained in Q. □ 

We call a disconnecting set S a shredder if p[G — S) > 3. 

Lemma 7. |8l9j Let G = {V, E) satisfy t{G) > k{G) + 3, and S be a shredder with 
|S1 = k{G). If an S-component T G C{G — S) contains a minimal tight set D G T>{G), 
but no other minimal tight set in T>{G) — D, then T is the superleaf with T Zf D. □ 

Lemma 8 . |8I9| Let S be a shredder with |S1 = k(G) in a connected graph G = {V, E). 
If p{G — S) > k{G) + 1, then every superleaf Q in G satisfies Q n S = 0. □ 



Theorem 2. ^ Lemma 5.8] Let k > 2 and G = (V,E) be a (k — l)-connected graph 
such that t{G) > max{2fc — 2, fe -|- 2} and /3(G) < [t(G)/2]. Suppose that G has a 
shredder S with |S] = k(G) such that every S-component contains exactly one minimal 
tight set. Then t(G) = 2fc — 2 holds and the minimum number of edges required to make 
G k-connected is 2k — 4. if G is a complete bipartite graph Kk-i,k-i, and k-2-\- \{k — 
l)/2] otherwise. Moreover, such set of edges can be found in 0{n) time if all minimal 
tight sets in G have been found. □ 

We show a new property of a shredder in a fe-connected graph (the proof is omitted). 

Lemma 9. Let S be a shredder with |S| = k{G) in a connected graph G = (V,E). 
Assume that every S-component is a superleaf in G. 

(1) If p{G — S) > ii{G) -\- 1, then every minimum disconnecting set Si other than S 
satisfies p{G — Si) = 2 and Si n D = 0 for all D G T>{G), and has an Si-component 
Ti Q for some S-component Q G C{G — S). 

(2) If p{G — S) > k(G) -I- 2, then for any subset W C V with \W\ = k(G) -I- 1 and 

p(G — W) > 2, there is at most one minimal tight set D G T>{G) with D n IT 7 ^ 0. □ 
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6 Edge-Splitting Preserving fc-Connectivity 

6.1 Edge-Splitting in (fc — l)-Connected Graphs 

Let H = {y VJ {s}, E) be a graph with a designated vertex s and |E| > fc + 1 such that 
KH{x,y) > k holds for all distinct two vertices x,y € V (which is equivalent to the 
condition that |rW(y)| > fc for all non-dominating sets V C E in G). Let G — H — s, 
and assume that G is not fc-connected. Then k{G) = fc — 1 holds, since G satisfies 
Kc(x,y) > KH(x,y) — 1 > fc — 1 for all x,y € V. Note that Eh{s) D D 7 ^ 0 holds 
for every minimal tight set D G E(G) since otherwise KH{x,y) > k cannot hold for 
X £ D and y £ V — D — Eg^D). A pair {(s,u), (s,u)} of two edges in Eh{s) is called 
k-splittable, if the graph H' resulting from splitting edges (s, u) and (s, v) satisfies 
KH'{x,y) > fc for all pairs x,y £V. 

The following theorems describe some conditions that admit fc-splittable splittings 
at s in a (fc — l)-connected graph G. 

Theorem 3. [Q Let H — {V U be a graph with a designated vertex s, k >2 

be an integer such that KH(x,y) > fc for all pairs x,y £ V, and let G = H — s satisfy 
k(G) = fc — 1 and t{G) > k + 2. Assume that G has three distinct superleaves Qi, Q 2 
and Qs such that /g(Qi) H Q 2 = 0 = Eg{Qi) H Q 3 holds and rc{Qi) is not a shredder 
in G. Let Di C Qi, i = 1,2,3, be the minimal tight set in T>(G). Then, for any three 
vertices Xi £ Eh{s) n Di, i = 1,2, 3, at least one of {(s, x\), (s, 0 : 2 )}, {(s, X 2 ), (s, 0 : 3 )} 
and {(s, ica), (s, xi)} is k-splittable. Moreover t{H' — s) = t(G) — 2 holds for the resulting 
graph H' from the splitting. □ 

Theorem 4. Let H = {V U {s}, E) be a graph with a designated vertex s, k > 2 
be an integer such that KH(x,y) > fc for all pairs x,y £ V, and let G = H — s satisfy 
k(G) = fc — 1 and t(G) > max{2fc — 2, fc -|- 2}. Let Q G V be an arbitrary superleaf 
such that S = Eg{Q) is a shredder in G. If G has a set T* £ C{G — S) — Q with 
ch{s,T*) > 2, then {(s, x), (s, y)} is k-splittable for any vertices x £ Eh{s) D Q and 
y £ rH{s)nT*. □ 

Based on Theorems and H we can show the following three new properties on 

fc-splittable splitting pairs (the proofs are omitted). 

Lemma 10. Let H = {V U {s},E) be a graph with a designated vertex s, k >2 be 
an integer such that KH{x,y) > fc for all pairs x,y £ V , and let G — H — s satisfy 
k{G) = fc — 1, t{G) > max{2fc — 2, fc -|- 2, /3(G) -I- 1}, and 13(G) — 1 > [t(G)/2] > fc — 1. 
Then there is a k-splittable pair {(s,®), (s,y)} such that f3(G -\- (x,y)) = (3(G) — 1. □ 

Lemma 11. Let H = (V VJ {s}, E) be a graph with a designated vertex s, k > 2 be an 
integer such that kh(x, y) > k for all pairs x,y £ V , and let G = H — s satisfy k(G) = 
fc — 1 and /3(G) = t(G) > max{2fc — 2, fc-l-2}. Let S* be a shredder in G with IS*] = fc — 1 
and p(G — S*) = f3(G). Assume that G has an S* -component Ti £ C(G — S*) and an 
edge e = (ui, W 2 ) such that vi £ Ti, V 2 £ Ti U S* , and p(G — S*) = p(G — e — S*). Let 
Hi ~ H -e-\-{{s, ui), (s, U 2 )}. Then there is an edge-splitting of a pair {(s, ui), (s, U 2 )} 
at s in Hi such that the graphs H' = Hi — {(s, ui), (s, U 2 )} + {(mi, U 2 )} and G' = H' — s 
satisfy kh/(x, y) > k for all pairs x,y £ V, k(G') = fc— 1 and f3(G'—S*) = (3(G—S*)—1. 

□ 
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Lemma 12. Let H = (V U be a graph with a designated vertex s, k > 2 be 

an integer such that KH(x,y) > k for all pairs x,y £ V, and let G = H — s satisfy 
k{G) = k — 1, t{G) > max{2fe — 2, fc + 2}, and l3{G) < [t(G)/2]. Then (a) there is a 
k-splittable pair {(s, x), (s, i/)} such that t(G + (x, y)) < t(G) — 1, or (b) t(G) — 2k — 2 
holds and G can be made k-connected by adding at most 2k — 4 new edges. □ 

6.2 Edge-Splitting in an Arbitrary Graph 

In this section, we consider an s-basally fc-connected graph H — {V U {s},-E) with a 
designated vertex s. A pair {(s, u), (s, n)} of two edges in Eh{s) is called k-feasible at 
s, if the graph H' resulting from splitting edges {s,u) and (s,v) remains s-basally k- 
connected. We show two properties on a fc-feasible edge-splitting at s in // (the proofs 
are omitted). 

Theorem 5. Let H = (Eu{s}, E) be an s-basally k-connected with a designated vertex 
s, and k\ be an integer with k > k\ >2 such that G — H — s satisfies k(G) = fci — 1 
and t{G) > ki -\- 2. Then there is a subgraph Hi = (V U {s}, Ei = E{G) U F') of H 
such that F' C Eh{s), \F'\ = |I>(G)| and chi(s,D) = 1 holds for all D € T>{G). For 
such Hi, if a pair {{s,vi),{s,V 2 )} is ki-splittable at s in Hi, then {{s,vi),{s,V 2 )} is 
k-feasible at s in H . □ 



Theorem 6. Let H = (I/U{s}, E) be an s-basally k-connected graph with a designated 
vertex s and ki be an integer with k > ki > 1 such that G — H — s satisfies k(G) = 
ki — 1 and t(G) > ki -\-2. Assume that G has a minimum disconnecting set S* with 
p{G — S*) > ki -\- 2 such that every S* -component Qi € {Qi, ■ ■ ■ , Qp} = C(G — S*) 
is a superleaf in G, where p = t{G) = p{G — S*) holds {by Lemmas and0 and 
Di £ L>{G) denotes the minimal tight set contained in Qi. Then there is a subset 
E' = {ci, e'i I i = 1,2, ... ,p} C Eh{s) such that for a = (s, Ui) and e' = (s, Vi), i = 
1,2, ... ,p, Ui,Vi £ Di holds. For such F' , the graph H' — H — (s, Wi)}+ 

s-basally k-connected and G' — H' — s satisfies k,{G') = ki. □ 

7 Correctness of Step II 

To show the correctness of Step II, it suffices to prove Property|21 For this, we present 
an algorithm, called k-SPLITI, which finds from a given s-basally fc-connected graph H 
with k{H — s) < fc — 2 an s-basally fc-connected graph H* with k{H* — s) = k{H -s)-\-l 
by splitting some edges incident to s and by adding at most max{2K(IL — s) — 2, k{H — 
s) -I- 1} new edges. 

Algorithm K-SPLITl(il/, s, fc) 

Input: An s-basally fc-connected graph H = {V U{s}, EuF) with a designated vertex 
s, F = Eh{s), and \V\ >k-\-l. Let ki = n{H — s) + 1 and G = H — s. 

Output: An s-basally ^-connected graph H* = (L U {s}, i? U Fb U EJ U E|), where 
Fq C F, E* is the set of edges generated by splitting the edges in Eh{s) — Fq at 
s, and E 2 is a set of new edges connecting vertices in V, such that k{H* — s) = ki 
and \E^\ < max{2/ci — 4, ki}. 
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Step 1. If fci = 1, then choose a subset F' = {a = {s,Ui),e'i = {s,Vi) \ i = 
1,2, ... ,p} C Eh{s) such that Ui,Vi € Di € {Di, D 2 , . . . , Dp} = D{G) holds for 
each i = l,2,...,p. Then the graph H* ~ H + "“0}“ Vi-i), 

(s,Ui)} is s-basally fc-connected and H* — s is connected by Theorem El Halt 
outputting H* , where E 2 = 0. 

If fci > 2, then we go to Step 2 after setting H 2 := {V U{s}, i?U-F 2 ) with a minimal 
subset T 2 C such that every Di G D(G) satisfies ch 2 {s, Di) > 1. 

Step 2. Let G 2 ~ H 2 — s. If t{G 2 ) < max{2fci — 3, fci + 1}, then G 2 can be made fci- 
connected by adding a set of at most max{2fci —4, fci} new edges by LemmaEI 
Halt after outputting H* H 2 + [F — F 2 ) U E 2 . 

Otherwise, every Di G D{G) satisfies ch 2 {s, Di) = 1 (by Lemma0) and let Fh 2 (s)0 
Di = {ui}. Repeat the following procedure (A) or (B) while t{G 2 ) > max{2fci — 
2, fci + 2}, where we execute procedure (A) if j 3 {G 2 ) < }t(G2)/2] or procedure (B) 
otherwise. 

Procedure (A): 

(A-1) If there is a fci-splittable pair (s,u)} such that H' = H 2 — {(s, u), 

(s,u)} + {(u, u)} with G' = F[' — s satisfies t{G') < t(G 2 ) — 1, then continue 
executing Step 2 after setting H 2 H 2 — {(s,w), (s,w)} + {(u,i;)}. 

(A-2) Otherwise, by Lemma IT^ G 2 can be made fci-connected by adding a set 
E 2 of at most 2fci — 4 new edges. Halt outputting H* := H 2 + {F — F 2 ) U E 2 . 

Property 4- Each iteration of procedure (A-1) decreases t(G 2 ) by at least one. □ 

Procedure (B): Choose a minimum disconnecting set S* in G 2 satisfying p(G 2 — 
S*) = P{G 2 ), where S* is a shredder in G 2 (by /3(G2) > }t(G2)/2] -I- 1 > max{fci, 
(fci +2)/2-f 1} > 3). 

(B-1) If t(G 2 ) > /9(G2) -f 1, then find a fci-splittable pair {(s, u), (s, u)} in Lemma 
mil such that H' = H 2 — {(s,u), (s, u)} -I- {(u, u)} with G' = H' — s satisfies 
P{G') < /3(G2) — 1. Continue executing Step 2 after setting H 2 := i ?2 — 
{(s,u), (s,u)} -I- {{u,v)}. 

(B-2) Otherwise (P{G 2 ) = t(G 2 )), by Lemma Q and t(G 2 ) > fci -I- 2, every S*- 
component T G C(G 2 — S*) is a superleaf in G 2 . Moreover, Lemma El and 
p{G 2 — S'*) > fci tell that every superleaf Q in G 2 satisfies Q f\ S* = Ih. Hence 
we have p{G 2 — S*) = t{G 2 ) > ki + 2. Let D := {D G D{G)\ch 2 {s, D) = 
1} and (s,Vi) G EH[s,Di) — {(s,Ui)} for Di G D (such Vi exists since H 
satisfies @ with respect to fc > fci -I- 1). Let FFs ~ H + U}L 2 {(^*-ii 
UjLjiC'*! ^*- 1)1 ('* 1 '*^0}- Halt outputting H* ~ H 3 , where = 0 and H 3 
remains s-basally fc-connected by Theorem El 

Property 5. Each iteration of (B-1) decreases /3(G2) by one. □ 

During execution of Step 2, no new edge is added to IT 2 immediately before con- 
structing the final H* . Then it is clear that the output H* = {V VJ {s\ , EVJ FqVJ E}VJ E 2 ) 
is constructed from H by splitting edges in F — Fq (where E} denotes the set of the 
resulting split edges) and adding edges in E 2 . We prove that the output FI* is s- 
basally fc-connected, and satisfies k(H* — s) = fci and |i?J| < max{2fci — 4, fci}. If 
H* is output in Step 1, then by Theorem El the output H* is s-basally connected. 



Minimum Augmentation to a fc-Connected Graph 



297 



and satisfies k(H* — s) = ki — 1 and E 2 = 0. In procedures (A) and (B), we see from 
Theorem0that H 2 U{F — F 2 ) is s-basally fc-connected. In procedure (A), Lemma[E|en- 
sures that we can execute either (A-1) or (A-2) if /3(G2) < [t(G2)/2], and this proves 
Property 0 In procedure (B), we can execute either (B-1) or (B-2) by Lemma ED 
and by Theorem O, respectively, where Theorem El is applicable to (B-2) by Lem- 
mas 0 and 0 Property 0 follows from Lemma 11 lH Therefore, by Properties ^ and 
O H 2 satisfies t(G 2 ) < max{2fci — 3, fei -f 1}, or condition of (A-2) or (B-2) after 
t{H 2 — s) -I- P(H 2 — s) < n + k iterations of Step 2. If t{G 2 ) < max{2fci — 3, fci -I- 1} 
holds in Step 2, then by Lemma Owe can obtain a desired graph FI* . 

The algorithm k-SPLIT 1(L/, s, fc) can be implemented to run in 0(min{A:i, y/n} 
[k\n + kin^)) time as follows. The running time of k-SPLIT 1(L/, s, fc) is equal to the 
complexity of finding a sequence of fci-splittable pairs in 772, plus the complexity of 
computing 75J. The complexity of finding a sequence of fci-splittable pairs in H 2 is 
equal to that of finding K-splittable pairs in |8l9j . The edge set 7?| can be computed in 
0{m\n{k\,^/n}k\n) time by applying Phase 5 of Jordan’s algorithm in j 1 1 Ij . 



8 Correctness of Step III 

In this section, we prove the correctness of Step III by presenting an algorithm for 
computing a graph 77*, in Property 0 

Given a graph 77 with a designated vertex s, we say that a graph 77' is obtained 
from 77 by shifting an edge (s, u) to (s, v) at s, if we construct 77' by replacing (s, u) 
with (s, v) in 77. 

Algorithm k-SPLIT 2(77, s, 7^’, fe) 

Input: An undirected graph 77 = (P U {s},75 U F^_^ U 7?J_j) and an integer k > 2 
such that 7’*_i = Fh{s) and k(77 — s) = k — 1 hold, and KH(x,y) > k holds for 
every x,y £ V. Let 77' = (P U {s}, 7? U 7’*_i U F') denote the graph obtained from 
77 by hooking up all edges in 7?*_i at s, where F' is the set of the edges hooked 
up. 

Output: An undirected graph 77* = (P U {s},7? U U FI) obtained from FI' by 
splitting edges in F^_i U F' and by shifting some split edges such that 77* satisfies 
kh* (x, y) > k, x,y € V and G* = 77* — s satisfies k(G*) = 7 — 1 and one of the 
following (a) - (c), where Fh*{s) = F^ C Fk-i U F' . 

(a) 7(G*) < max{2fc — 3, 7 -I- 1}. 

(b) t(G*) = 27 — 2 holds and G* can be made 7-connected by adding at most 
27 — 4 new edges. 

(c) The graph (P, F U Ff.) can be made 7-connected by adding at most p((P, F) — 
S) — \Fl \ — 1 new edges. 

Step 1. After setting 77i := 77, F := Fh{s), and F' := 7?(77i) — F — F, we go to 
Step 2. 

Step 2. Let Gi := 77i — s. If t{Gi) < max{27 — 3, 7 -|- 1} holds, then halt outputting 
77* := Hi. 

Otherwise, while 7(Gi) > max{27 — 2, 7 -|- 2} holds, repeat the following procedure 

(A) or (B), where we execute procedure (A) if /3(Gi) < [i(Gi)/2] or procedure 

(B) otherwise. 
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Procedure (A): 

(A-1) If there is a fc-splittable pair {(s,mi), (s,M2)} such that H' — Hi — {(s,ui), 
(s,U 2 )} + {(ui,U 2 )} satisfies t{H' — s) < t(Gi) — 1, then continue execut- 
ing Step 2 after setting Hi := Hi — {(s, «i), (s, M 2 )} + {(mi, M 2 )}, F ~ F — 
{(s,Mi), (s,M 2 )}, and E' := E' U {(mi,M2)}. 

(A-2) Otherwise, by Lemma El i{Gi) = 2k — 2 holds and Gi can be made k- 
connected by adding at most 2k — 4 new edges. Output H* := Hi as an solution 
satisfying (b). 

Property 6. Each iteration of (A-1) decreases t{Gi) by at least one. □ 

Procedure (B): Choose a minimum disconnecting set S* in Gi satisfying p(Gi — 

S*) = /3(Gi), where S* is a shredder in Gi by f3{Gi) > [t(Gi)/2] -1-1 > max{fc, (fe-l- 
2)/2-bl} >3. 

(B- 1 ) If t(Gi) > f){Gi) -I- 1 then, find a fc-splittable pair {(s, mi), (s, M 2 )} in 
Lemma II 1)1 so that H' = Hi — {(s, Mi), (s, M 2 )} -I- {(mi, M 2 )} with G' = H' — s 
satishes P{G') < P{Gi) — 1. After setting Hi H' , E E — {{s, Mi), (s, M 2 )}, 
and E' := E' U {(mi,M 2 )}, continue executing Step 2. 

(B-2) Otherwise (/3(Gi) = t(Gi)), let S* be a shredder in Gi with |S*| = fc — 1 
and p(Gi — S*) = /3(Gi). We distinguish the following three subcases in (B-2). 

(B-2-1) Gi has an edge e = (mi,M 2 ) £ E' with {mi,M 2 } — 5* 7 ^ 0 and p{Gi — 
e — S*) = p{Gi — S*). Then after hooking up the edge e at s, we split a 
pair {(s,Mi), (s,M 2 )} in Lemma. im such that the resulting graph H' ;= Hi — 
{(s,Mi), (s,M 2 ), (mi,M 2 )} -|-{(mi,M 2 ), (s, M l), (s,M 2 )} Satisfies KH,{x,y) > k for 
all x,y £V and G' = H' — s satisfies n{G') = fc — 1, and !3{G' — S*) = /3(Gi — 
S'*) — 1. After setting Hi := H' , E ~ P U {(s. Mi), (s, M 2 )} — {(s. Mi), (s, M 2 )}, 
and E' E' VJ {(mi,M 2 )} — {(mi,M 2 )}, continue executing Step 2. 

(B-2-2) There is an edge e = (mi,M 2 ) £ E' with Mi, M 2 £ S*. We replace the 
edge e with a new edge e' = (mi,ms) for a vertex M 3 £ V — S* . Note that 
H' := Hi — e + e' also satisfies kh' {x, y) > k for all a;, y £ V hy DC\S* =0 for 
all D £ I>(Gi) (by Lemma 0 and we have V(e') — S* 7 ^ 0 and p{{H' — s) — 
e — S*) = p{{H' — s) — S*). Then go to (B-2-1) after setting Hi ;= H' , and 
E' —E'uie'j-ie}. 

(B-2-3) Every edge (m, m) £ E' satisfies m, m £ V* — S* and p((Gi — {e}) — S*) = 
p{Gi — S*) -I- 1. Output H* := Hi as an solution satisfying (c). 

Property 7. Each iteration of (B-1) and (B-2-1) decreases /3(Gi) at least by one. 

□ 

Let us prove that a graph H* = Hi in (B-2-3) satisfies condition (c). Since m, m £ 
V — S* and p((Gi — {e}) — S*) = p(Gi — S*) -I- 1 hold for all edges e = (m, m) £ E' , 
we have p{(y,E) - S*) = p{Gi - S*) -b \E'\. Let {Ti,...,Tp} = C(Gi - S*), where 
p = p(Gi — S*). For each Ti £ C(Gi — S*), let Ui £ Fhi (s) ODi with Ti ^ Di G T>{Gi). 
We show that for a set Ei = {(mi, Mi+i)|i = 1, . . . ,p — 1} of new edges, G' Gi U Ei 
is fc-connected. If G' has a disconnecting set S' G V with S' 7 ^ S* and IS"'! = k — 1, 
then by Lemma 0(1) p{G' — S') = 2 holds and there is an S''-component T' C Ti for 
some Ti £ C(Gi — S*), which contradicts that the edge (ui,Ui+i) or {ui-i, Ui) connects 
T'(3 Di) and some Tj £ C(Gi — S'*) — {T} in G' . Therefore, G' = (V) E U E") with 
E" — E’^VJ El is fc-connected, and |Ei| = p((V) E) — S) — \E1\ — 1 holds. 

We see that the algorithm k-SPLYT 2{H, s, F, k) runs in 0(min{fc, ^/n} k'n?) time, 
as observed in the analysis of the complexity of Ac-SPLITl(iL, s, k). 
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9 Concluding Remarks 

In this paper, we gave a polynomial time algorithm for augmenting a given f-connected 
graph G to a fc-connected graph by adding at most S{k—1) + max{0, {5 — 1) (€ — 3) — 1} 
surplus edges over the optimum for A: > 4, ^ > 0, and 5 = k — 1. However, in the case 
of ^ + 1 = fc > 4, Jordan’s algorithm mm produces at most (fc — 2)/2 surplus edges 
over the optimum. Therefore, it is a future work to close the gap between this and our 
bound. 
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Abstract. This paper deals with the problem of finding a minimum-cost 
vertex subset S in an undirected network such that for each vertex v we 
can send d{v) units of flow from S to v. Although this problem is NP-hard 
in general, Tamura et al. have presented a greedy algorithm for solving 
the special case with uniform costs on the vertices. We give a simpler 
proof on the validity of the greedy algorithm using linear programming 
duality and improve the running time bound from O(n^M) to O(nM), 
where n is the number of vertices in the network and M denotes the time 
for max-flow computation in the network with n vertices and m edges. 
We also present an 0(n(m -I- n log n)) time algorithm for the special case 
with uniform demands and arbitrary costs. 



1 Introduction 

Let J\f = (G, u, d, c) be an undirected network on the underlying graph G = 
(U, E) with the vertex set V and the edge set E. Let n = \V\ and m = \E\. It is 
endowed with a capacity function u : E ^ R+, a demand function d : V ^ R+j 
and a cost function c : V —>■ R+, where R+ denotes the set of nonnegative reals. 
This paper addresses the problem of finding a minimum-cost vertex subset S Q V 
such that for each v G V we can send d{v) units of flow from S to v. 

For a pair of disjoint subsets X,Y C V, we denote by A(A, Y) the maximum 
flow value between X and Y in Af. We simply write X{v,Y) and X{X,w) for 
v,w GV instead of A({u},y) and A(X, {w}), respectively. For convenience, we 
assign A(A, Y) = -boo if X C\Y 0. Then the problem is formulated as follows. 

Minimize c(v) 

ves 

subject to S' C U, (1) 

X(S,v)>d(v) (vGV). 



M.M. Halldorsson (Ed.): SWAT 2000, LNCS 1851, pp. 300-E31 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 
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We call this problem SOURCE Location. We say that a vertex set S CV covers 
a vertex v if \{S, v) > d{v). Namely, SOURCE Location asks for a minimum-cost 
subset S CV that covers all the vertices in V. 

A special case of this problem with a constant cost function was introduced 
by Tamura et al. HH. They called it plural cover problem. They first considered 
the case in which both d and c are constant and described an algorithm that 
runs in 0{n^M) time [ 1 1 )j . where M denotes the time complexity for computing 
an s-t maximum flow in a given network J\f p I I3l4j . Later Tamura et al. HH 
showed that a simple greedy algorithm solves problem SOURCE Location in 
0{ii?M) time even if the demand function d is arbitrary while the cost function 
c is still constant. Ito et al. 0 described another algorithm to improve the time 
complexity to 0{npM), where p is the number of distinct values of d(v) {v G V), 
i.e., p = |{d(z;) I?; G V}\. 

In this paper, we analyze the greedy algorithm of Tamura et al. m to give 
a simpler proof based on the linear programming duality. We then improve the 
greedy algorithm to run in 0{nM) time. 

As for the case in which the demand function d is constant, we give an 
0{n(m -|- nlogn)) time algorithm. The algorithm makes use of maximum ad- 
jacency (MA) ordering (see Section 4 for MA ordering). The MA ordering has 
been used by Nagamochi and Ibaraki for solving the problems of minimum cut 
0 and of edge-connectivity augmentation jjj . 

Finally, we show that SOURCE Location is in general NP-hard. We show this 
by reducing the knapsack problem to SOURCE Location. Hence, it remains open 
to prove the NP-hardness in the strong sense or to devise a pseudo-polynomial 
time algorithm. 

We summarize the time complexity of SOURCE Location in Tabled where 
bold letters indicate the results obtained in this paper. 

The rest of the paper is organized as follows. Section 2 formulates SOURCE 
Location as an integer programming problem. Section 3 considers SOURCE 
Location when the cost function c is constant, and Section 4 discusses the case 
in which the demand function d is constant. In Section 5, we show that SOURCE 
Location is in general NP-hard. 



2 Integer Programming Formulation 

In this section, we formulate SOURCE Location as an integer programming 
problem with an exponential number of constraints. 

A cut is a proper nonempty subset of V. For a cut X, let AX denote the set 
of edges that cross A, i.e., AX = {e\e = {v,w) £ E, v £ X, w £ V — A}, and 
k(A) its capacity, i.e.. 



k(X) = u{e). 

eeAX 
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Table 1. Summary of the results on SOURCE Location 





c: constant 


c: arbitrary 


d) constant 


0{n^M) Tamura et al. (1992) 1111 

0{nM) Ito et al. (1997) 

0{n{m -|- n log n)) 


0(n(m -\- n log n)) 


d'. arbitrary 


0{n^M) Tamura et al. (1998) |1 1| 

0{npM) Ito et al. (1997) [S| 

0{nM) 


NP-hard 



M : the time complexity for computing a maximum s-t flow in A/”. 
p: the number of distinct values of d{v) (u € V). 



li X = {u}, we write k{v) instead of k({u}). For a disjoint pair of vertex subsets 
X,Y Cl/, we denote k{X,Y) = u(e). For v € V, we simply write 

eeAXnAY 

k{X,v) instead of k{X,{v}). 

We also denote by d{W) the maximum demand in W, i.e., 
d{W) — max{d(u) | v € W}. 

We say that a vertex v attains the maximum demand in W if d(v) = d{W). 
A cut W is called deficient if n{W) < d{W). If a cut W is deficient and no other 
subset X C W is deficient, W is called a minimal deficient set. 

Lemma 1 (El). Let M = {G = {V, E),u, d, c) be an undirected network. Then 
S Q V covers all vertices in V if and only if S C\W yf 0 holds for every minimal 
deficient set W. 

Let W = { Wi , W 2 ,■■■ ,Wi} be the family of all the minimal deficient sets 
and let V = {ui, U 2 , • • • > Un}. Define an I x n matrix A = (Aij) by Aij = 1 if 
Vj € Wi and Aij = 0 otherwise. From Lemma d SOURCE Location can be 
written as the following 0-1 integer programming problem: 

n 

i=i 

n 

Y^A,jXj>1 ( z = 1 , 2 , •••,;) ( 2 ) 

i=i 

a:jG{0,l} (j = l,2,---,n), 

where Cj = c{vj) {j = 1, 2, • • • , n), and x = {x\.,X 2 , ■ ■ ■ , Xn) is the characteristic 
vector of a subset of V. 



Minimize 



subject to 
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Remark 1. Note that I = \yV\ might be exponential in n = \V\ and m = \E\. 
For example, let us consider a network J\[ = {G = {V, E),u,d,c), where V = 
{vi,V 2 , • • • , Vn}, E = {(ui, Ui) I z = 2, 3, • • • , n}, u{e) = I (e € E) and 



d{vi) 



ifz = 0 
0 otherwise, 



and c is an arbitrary cost function. Then we can see that 



W = {W\ |VF| 



n 

2 



+ l,W^v^}, 



and hence 




□ 



3 The Uniform Cost Case 

3.1 A Greedy Algorithm 

In this section, we consider SOURCE Location with a constant cost function. 
Tamura et al. [m proposed the following greedy algorithm to solve SOURCE 
Location. 

Algorithm Greedy 

Step 0: Arrange the vertices zii, U 2 , • • • , Un in U such that d(vi) < d{v 2 ) < ■ ■ ■ < 
d{vn). 

Step 1 : Initialize j:=l and S:=V. 

Step 2: If S' — {vj} covers all vertices in V, then S:=S — 

Step 3: If j = n then output S and halt. Otherwise, j:=j + 1 and go to Step 2. 

□ 



Example 1. Let us apply Greedy to the network M = {G = {V, E),u,d,c) 
given in Figure Q where u and d are respectively attached to edges and vertices 
in Figure E and c{v) = 1 for all v G V. The results are illustrated in Figure El 
We initially include all vertices in S and check whether the set S — {zii} covers 
all vertices or not. Since it covers Vi (and hence all vertices in U), we update 
S := S — {ui} (see (i)). We next check whether S — {^ 2 } covers all vertices 
or not. Since it covers both v\ and V 2 ^ we update S := S — {V 2 } (see (ii)). By 
repeating this argument to the current S = {v 3 ,V 4 , . . . ,vr} (see (iii)-(vii)), we 
finally obtain S = {v^^v^jVr} shown in (viii). □ 
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Fig. 1. A network A/” = (G = (V, E), u, d, c), where c{v) = 1 (w € V). 



In order to show the correctness of algorithm Greedy, we consider the linear 
programming relaxation of Q : 



and its dual: 



Minimize 



subject to 



n 

i=i 



i=i 



Xj > 0 



(i = l, 2, •••,/), 
(j = 1,2, - •• ,n), 



( 3 ) 



i 



Maximize 






subject to 


1 

^ijUi ^ G 


(j = l,2,---,n) 




2/i > 0 


(i = l,2,-.-,Z). 



Recall that cj = 1 (j = 1, 2, • • • , n) is assumed in this section. 

We also replace Steps 1 and 2 in algorithm Greedy as follows. 

Step 1': Initialize j := 1, S' := R and j/i := 0 for i = 1, 2, • • • , /. 

Step 2': (2-1) If S — {ujj covers all vertices in V, then S:=S — {vj}. 

(2-2) Otherwise, choose a W £ W with WiC\S = {vj}, and yi := 1. 

Note that Step 1' (initialization of y) in the revised version might take expo- 
nential time (since |W| might be exponential). However, this causes no trouble 
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Fig. 2. Algorithm Greedy applied to the network Af in FigureQ 
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since we are now interested in the validity of the algorithm. Obviously, the al- 
gorithm always keeps a feasible solution S (i.e., S covers all vertices in V). 

Let X* and y* be the primal and dual variables obtained at the end of the 
revised greedy algorithm. Note that x* is the characteristic vector of the output 
S of the algorithm. 

The algorithm does not delete Vj from S if and only if it updates yi as yi := 1 
for some i with Wi 0 5'= {vj}. Hence, at the termination, we have 

n I 

i=i i=l 

Therefore, x* is a 0-1 solution satisfying (0 and By the weak duality of 
linear programming problems 0 and we only need to prove the feasibility 
of y* in 0 to show the correctness of the algorithm Greedy. The feasibility of 
y* will be proved by Lemmas 0 and 0 given below. 

Recall that the cut capacity function k satisfies 

k{X) + k{Y) > k{X -Y) + k{Y - X) (X, Y CV). (6) 

A set function satisfying (|EI) is called po si-modular in |S|. 

Lemma 2 (ini). Let M = {G = (y, E),u, d, c) be an undirected flow network. 
Let Wi and W 2 be minimal deficient sets in M , and for each j = 1, 2, letvi€ Wi 
be a uertex that attain the maximum demand in Wi. Lf Wi O W 2 yf 0, then we 
have Vi € Wi O W 2 or V 2 € Wi O W 2 . 



Proof Suppose, to the contrary, that both vi € Wi — W 2 and V 2 S W 2 — Wi 
hold. Since Wi and W 2 are deficient sets, d(vi) > k(Wi) and d{v 2 ) > k{W 2 ) 
hold. It follows from © that 

d(ui) -I- d(v2) > k(Wi) k{W2) 

> k{Wi - W 2 ) + n{W 2 - Wi). 



This means that d{vi) > k{Wi — W 2 ) or d(v 2 ) > k(W 2 — Wf) holds. Since we 
have v\ G Wi — W 2 and V 2 G W 2 — Wi by the assumption, it follows that Wi — W 2 
or W 2 — Wi is deficient, which contradicts the minimality of Wi or W 2 . □ 

Arrange the columns of A in such a way that d(vi) < d(v 2 ) < ■ ■ ■ < d(vn). 
For each index i with 1 < j < Z, let k(i) denote the maximum number k with 
Vk G Wi. Then Lemma El implies that the matrix A does not contain 



j k{ii) k{i2) 

ii ( 1 1 0 \ 

^2 ^ 1 0 1 J 

as a submatrix. 



(7) 



Lemma 3. The dual variable y* obtained by the revised greedy algorithm is 
feasible to 0. 
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Proof. Suppose, to the contrary, that y* is infeasible. There is a pair of distinct 
rows, 11,^2 and a column j such that = y*^ = 1 and = 1. Let 

jo be the largest such number j, where we assume that the columns of A is 
already arranged in such a way that d{vi) < d{v 2 ) < ••• < d(vn). Then we 
have k{ii) fc(* 2 ) since otherwise y*^ and y*^ must be updated in the same 
iteration in Step 2', a contradiction. Note that jo = k{ii) implies y*^ = 0 by the 
greedy algorithm. Hence we have jo < k{ii). Similarly, we also have jo < fc(z 2 ). 
Furthermore, we have = 0 due to the definition of jo- Similarly, we have 

= 0- This implies that A contains submatrix o forbidden by Lemma 

0 □ 



We have thus shown the following. 

Theorem 1. If the cost function c is constant, then algorithm Greedy produces 
an optimal solution o/ SOURCE LOCATION. 



3.2 An Efficient Implementation 

We now analyze the time complexity of algorithm Greedy. Steps 0, 1 and 3 
are clearly executed in O(nlogn), 0(1) and 0(n) time, respectively. As for Step 
2, Tamura et al. El checked if S — {vj} covers all vertices in V by computing 
A(iS'— {uj}, Vi) (i.e., a max flow from S—{vj} to Vi) for all Vi. Clearly, this requires 
OfnM) time, where M is the time complexity for computing a maximum s-t flow 
in the network Af man Since Step 2 is iterated n times, the required time is 
0{n^M) in total ITTl . 

However, the following lemma implies that Step 2 can be replaced by 
Step 2": If S' — {vj} covers vj, then S:=S — {vj}. 



Lemma 4. If S — {vj} covers Vj in Step 2 of algorithm Greedy, then S— {uy} 
covers Vi for all i < j- 

Proof. Assume that some Vi with z < j is not covered by S — {vj}. Then there 
exists a cut X with Vi&X,V — XAS — {vj}, and k{X) < dfuf). Then, 
S n A C {vj} clearly holds. Moreover, we have S fl A = {uy} since otherwise S 
does not cover Vi, which contradicts the property that Greedy always keeps a 
feasible set S. Hence, A separates Vj and S — {uy}. Since k(A) < d{vi) < d{vj), 
it follows that S — {vj} does not cover Vj, a contradiction. □ 

Thus we have improved the time complexity. 

Theorem 2. If the cost function c is constant, then problem SOURCE Location 
can be solved in 0{nM) time. 
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4 The Uniform Demand Case 

In this section, we consider SOURCE Location with a constant demand function 
d. We assume that d{v) = g {& fixed positive real) holds for all v G V. We 
show that it can be solved in 0{n{m + nlogn)) time without maximum flow 
computation. A key tool of the algorithm is the maximum adjacency (MA) 
ordering. 

An ordering vi,V 2 , ■ ■ ■ ,Vn of all vertices in V is called a maximum adjacency 
{MA) ordering if it satisfies 

K{{vi,V2,---,Vi},Vi+i)>K{{vi,V2,---,Vi},Vj) foT 1 < i < j < n. 

The MA ordering plays a crucial role in this section through the following lemma. 

Lemma 5 (|EjI|9|). Let G = {V,E) be an undirected graph with a nonnegative 
capacity function u. Then, the following statements hold. 

(i) An MA ordering vi,V 2 , ■ ■ ■ ,Vn can be computed in 0{m + nlogn) time. 
(ii) The last two vertices w„_i and for every MA ordering in G satisfy 

X{Vn-l,Vn) = K{Vn)- (8) 

□ 



We mention here that we can choose the first vertex v\ arbitrarily. 

Let us note that, if the demand function d is constant, minimal deficient sets 
are pairwise disjoint by the posi-modularity ® of k, i.e.. 



Wi n IT 2 = 0 

holds for every pair of Wi and W 2 in W. Therefore, in order to solve SOURCE 
Location, we try to find all minimal deficient sets W G W and construct a 
minimum-cost source set S' C M by choosing from each W G W a, vertex v gW 
with the minimum cost c{v) among W . 

Since any source set S must contain v G V such that k{v) < g, we initialize S 
as S := {u G y I k{v) < g}. To make use of MA orderings, we attach a new vertex 
s V) to a given network M and, for each vertex v G S, add the edge (s,u) 

with the capacity u(s,v) = g. By this modification of M, every vertices v G V 
satisfies k{v) > g, i.e., either k{v) > g holds in the original network or v G S 
(i.e., the (modified) network Af contains the edge (s,v) with u{s,v) = g). We 
then apply to the network M an MA ordering iig (= s), • • • , starting 

from s. By Lemma 0, we have 

\{Vn-l, Vn) = K{Vn) > ff- 

Namely, every cut X that separates Vn-i and Vn satisfies k{X) > g. This means 
that every minimal deficient set W G W (in the original network) that separates 
Vn-i and Vn forms W = {vn-i} or {f„}, since by the modification of J\f, such a 
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W must contain a vertex v € V such that k(v) < g in the original network, and 
hence we have \W\ = 1. Since we already checked whether a cut X of the type 
X = {u} {v G V) is deficient, we do not have to consider the cut X separating 
Vn-i and Vn- We thus merge the vertices Vn-i and into a single vertex v, and 
check if v satisfies k{v) > g. Since k{v) < g implies that W = {vn-i,Vn} is a 
minimal deficient set in the original network, if k{v) < g, we update the network 
Af by adding edge (s, v) with the capacity u(s, v) = g, and update S by adding 
Vn-I if c{vn-i) < c(u„); otherwise, u„. 

Now we have n{v) > g for all vertices except for s in the resulting network 
M . By repeating the above argument for M (i.e., we apply MA ordering vg (= 
s), ui, • • • , Vh-i,Vh to Af, merge the last two vertices Vh-i and Vh, and so on), we 
can compute a minimum-cost source set S. Formally it can be written as follows. 

Algorithm Contract 

Input: A network Af = {G = {V, E),u, d, c), where d{v) = g for all v. 

Output: A minimum-cost vertex set S C V which covers all vertices in V. 
Step 0: Initialize S := 0, V' := V U {s}, E' := E, and a{v) := v for all v gV. 
Step 1: For each vertex v G V such that k{v) < g, put E' := E' U {(s,u)}, 
u(s, v) := g, and S := S U {a(u)}. 

Step 2: 

(2-1) Compute an MA ordering Ug (= s), Ui, • • • , Vh-i, Vh starting from s in 
G' = iV',E'). 

(2-II) Merge the last two vertices Vh-i and Vh in G' into a single vertex v. 

Denote the resulting graph by G' again. 

(2-III) If c{a{vh-i)) < c{a{vh)), then a{v) := a{vh-i)', Otherwise, a{v) := 
a{vh). 

(2-IV) If k{v) < g in the current G', then update A' := E' U {(s,F)}, 
it(s, v) := g, and S := S U {a('i))}. 

Step 3: If \V'\ < 2 or E' contains the edges {s,v) for all v G V — {s}, then 
output S and halt. Otherwise go to Step 2. □ 

Note that the algorithm prepares a(-) for computing from each W G W a 
vertex v G W with the minimum cost c(v) among W. Formally, a(v) (v G V) 
stores the vertex v* in the original network Af having the minimum cost c{v*) 
among Py, where Py is the set of all vertices u in U which are merged to v. 

Example 2. Let us apply Algorithm Contract to the network Af = {G = 
(V,E),u,d,c) given in Figure 01 where u and c are respectively attached to 
edges and vertices in Figure Eland d{v) = 8 for all v. The results are illustrated 
in Figure 0 In Step 0, the algorithm initializes S, V', E', and a (see (i)), and 
since vs only satisfies K(ub) < 8, Step 1 updates the network (i) to (ii), and S := 
{t!b}. Step 2 then compute an MA ordering vq (= s),Ui (= vg),V 2 (= Vc),vs (= 
Vd),V 4 (= Ua), Us (= Vc) (see (iii-1)), and merge V 4 (= Va) and V 5 (= Vc) into Vac 
(see (iii-2)). Since c(a(ua)) < c(o;(uc)). Step 2 puts a(uac) := o(ua). Moreover, 
by K(uac) < 8, Step 2 updates E' := E' U {(s, Uac)}, u(s,Uac) := 8, and S := 
S U {a(uac) (= Ua)} (see (iii-3)). 
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Since \V'\ = 5 and (s, nd) ^ E' for example, the algorithm returns to Step 
2. Step 2 again compute an MA ordering vq (= s),vi(= Vb),V 2 (= Vac),V 3 (= 
Vd),V 4 (= Ve) (see (iv-1)), and merge V 3 (= Vd) and V 4 (= Vg) into Vde (see (iv-2)). 
Since c(a(z;d)) < c(a(t!e)), Step 2 puts a(fde) := 0!(z;d)- Moreover, by K(fde) < 8, 
Step 2 updates E' := if'U{(s,Ude)}, u{s,Vde) ■= 8, and S := S'U{a(ude) (= ?^d)} 
(see (iv-3)). Now E' contains the edges (s, v) for all v GV' — {s}. Step 3 outputs 
S = {va,, r>b, Ud} whose cost is 6. □ 

Theorem 3. Problem SOURCE Location can be solved in 0{n{m + nlogn)) 
time if the demand function d is constant. 

Proof. Since the above discussion shows the correctness of algorithm Contract, 
we only consider its time complexity. Clearly Steps 0, 1 and 3 take 0{n) time. 
Step 2 can be executed in 0{n{m + nlogn)) time since it has n — 1 iterations 
and each iteration takes 0(m + nlogn) time from Lemma 0 Therefore, in total, 
it requires 0(n(m + nlogn)) time. 

5 NP-hardness of General Case 

In this section, we show the NP-hardness of SOURCE Location with non- 
constant cost and demand functions. 

Theorem 4. Problem SOURCE Location is NP-hard, even if the undirected 
graph G = {V, E) is a star, i.e., E = {(z),rt;) | w G M\{i;}} for some v gV. 

Proof. We transform Problem Knapsack to this problem, where Knapsack is 
known to be NP-hard 0. 

Problem Knapsack 

Input: A finite set Z = {zi, Z 2 , ■ ■ ■ Zn} associated with a size function a : Z ^ 
Z+ and a value function w : Z — >■ Z+, and positive integer b (< 
where Z+ denotes the set of all nonnegative integers. 
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Fig. 4. Algorithm Contract applied to the network M in Figure Q 
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Output: A subset X C Z that is an optimal solution of 

Maximize 

zGX 

subject to ^ (t(z) < b, (9) 

zGX 

X cz. 

It is easy to see that Knapsack is polynomially equivalent to the problem 
of computing a subset Y C Z that solves 

Minimize 

z£Y 

subject to 

zeY zez 

Y C Z, 

by identifying Y with Z — X. Therefore, in the following we consider (CHI instead 
of 0. 

For this problem instance, we consider an undirected network Af = {G = 
(V,E),u,d,c) with V = Z U {zq}, E = {{zo,Zi) \ Zi e Z}, u{zo,Zi) = 
cr(zi) for i = 1,2, ■ ■ ■ ,n and 



d{zi) 



c{zt) 



Y.z,ez<^i^i) - b 


if i = 0 


0 


otherwise. 


+ 1 


if i = 0 


u>{z^) 


otherwise. 



Note that d{zi) = 0 for all Zi G Z. Therefore S C V covers all vertices in V if 
and only if it covers zq, i.e.. 



X{S,zo) > d{zo) = a{zi) -b. (11) 

Ziez 



Moreover, since {zq} and Z covers zq, and since c{zq) > ez 
optimal S is contained in Z. This implies that 

\{S,zo) = '^ u{zo,Zi) = '^ a{zi), (12) 

Zi^S z-i^S 



and hence (HU is equivalent to the constraint in m- 

Since c{zi) = ui{zi) for all Zi G Z, S G Z is an optimal solution for the 
instance of problem m if and only if it is optimal for the corresponding instance 
for Source Location. □ 
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6 Conclusion 

In this paper, we have analyzed the greedy algorithm of Tamura et al. El for 
Source Location with a constant cost function and given a simpler proof 
based on the linear programming duality. We have also improved the greedy 
algorithm to run in 0(nM) time. Moreover, we have given an 0(n{m + nlogn)) 
time algorithm for SOURCE Location with a constant demand function. Fi- 
nally, we have shown that SOURCE Location is in general NP-hard by reducing 
Knapsack to Source Location. 
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Abstract. Let G={V,E) be a connected graph with positive weights 
and n vertices. A subgraph G' is a t-spanner if for all u, vGV, the distance 
between u and v in the subgraph G' is at most t times the corresponding 
distance in G. We show a 0(n log n)-time algorithm which, given a set 
U of n points in d-dimensional space, and any constant t>l, produces a 
t-spanner of the complete Euclidean graph of G. The produced spanner 
have 0(n) edges, constant degree and weight 0{wt{MST)). 



1 Introduction 

Spanners have applications in the design of geometric networks. Consider a set 
V of n points in R‘^, where the dimension d is a constant. A geometric network 
on V can be modeled as an undirected graph G with vertex set V and with 
edges e = {u,v) of weight wt(e). A Euclidean network is a geometric network 
where the weight of the edge e = (w, v) is equal to the Euclidean distance d{u, v) 
between its two endpoints u and v. For u,v G V, let P be a uv-path in G, i.e., a 
path in G between u and v. The weight of the path P is denoted by wt{P) and is 
defined as the sum of the weights of the edges of P. Let t > 1 be a real number. 
We say that G is a t-spanner for V, if for each pair of points u,v G V, there 
exists a wri-path in G of weight at most t times the Euclidean distance between 
u and V. A sparse t-spanner is defined to be a t-spanner of size 0{n) and 
weight 0{wt{MST)). Given a geometric network G = (V,E), a weight function 
w defined on its edges, and two vertices u,v G V, we let P{g,iu}(w, v) denote the 
weight of the shortest path from u to v in G for the weight function w. 

The problem of constructing spanners has been investigated by many re- 
searchers. Keil and Gutwin showed that for any t > 1, and any set V of n 
points in the plane, a t-spanner for V having 0(n) edges can be constructed in 
O(nlogn) time. Salowe jQ, Vaidya m and Callahan and Kosaraju |5| showed 
the same result for any fixed dimension d. Das and Narasimhan ^ gave an 
0(n log^ n)-time algorithm that constructs for any set U of n points in R‘^, 
and any constant t > 1, a t-spanner for V in which the degree of every point 

* Funded by NSF (CCR-940-9752) and Cadence Design Systems, Inc. 
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is bounded by a constant, and whose total edge weight is proportional to the 
weight of a minimum spanning tree of V. Chen et al. showed that the lower 
bound for computing any t-spanner for a given set of points V in is Q{n log n) 
in the algebraic computation tree model. 

Mount 1^ has shown that a significant result claimed in Arya et al. Q of an 
0(n log n)-time algorithm to compute a sparse Euclidean spanner is incorrect. 
Thus the problem of devising an 0(n log n)-time algorithm to produce sparse 
spanners remained unsolved. Sparse spanners are also useful in designing efficient 
approximation schemes for geometric problems. In a startling development, Rao 
and Smith showed an optimal 0(n log n)-time approximation scheme for the 
well-known Euclidean traveling salesperson problem, assuming that it is possi- 
ble to compute sparse spanners in time O(nlogn). Since the claim by Arya et 
al. IP was incorrect, the existence of an 0(nlogn)-time algorithm to construct 
sparse spanners has become a critical open problem. Note that the most efficient 
algorithm to construct sparse spanners is due to Das and Narasimhan P and 
runs in O(nlog^n) time. In this paper we design an algorithm that produces a 
t-spanner in time O(nlogn), in the standard real RAM model defined in p. 

Theorem 1. Given a set V of n points in d-dimensional space, and any real 
constant t>l, a t-spanner of the complete Euclidean graph can be constructed 
in 0(nlogn) time such that the spanner has 0(n) edges, constant degree and 
weight 0(wt{MST)). The constants in the O-notation depend on t and d. 

It was shown in P that the greedy algorithm produces spanners with 0{n) 
edges and weight 0{l)-wt{M ST) . However, a naive implementation of the greedy 
algorithm had a running time of 0(n^ log n), mainly due to the fact that I7(n) 
shortest path queries needed to be answered in a “dynamic” graph with at most 
0{n) edges, each of which could take 0(n log n) time. 

Our algorithm is inspired by the algorithm due to Das and Narasimhan p. 
They showed how to use “clustering” in order to speed up shortest path queries. 
However, their algorithm was not efficient enough because they were unable 
to “maintain” the clusters efficiently and the algorithm had to frequently re- 
build the clusters. For convenience, we will refer to the 0(n log^ n)-time algo- 
rithm from P as the DN-Clustering spanner algorithm. We retain the general 
framework of that algorithm. Our main contribution is in developing techniques 
to efficiently perform “clustering” . We believe that the techniques that we have 
developed are likely to be useful in designing other greedy-style “dynamic al- 
gorithms”, i.e., in situations where only insertions take place and particularly 
in “increasing” order of length. What we prove in this paper is that after some 
preprocessing (which takes O(nlogn) time), given a linear-sized edge-weighted 
graph with integral edge weights in the range [O..A^], and given a set of “cluster- 
centers”, then we can perform “clustering” very efficiently in only 0(n -\- N) 
time. In 1999, Thorup JIDj showed that single source shortest path queries could 
be answered in linear time for undirected graphs with integer weights. The main 
reasons why this algorithm is not used in this paper is that it does not visit the 
vertices in order of increasing distance, which is crucial for our algorithm. Also, 
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it uses bit-shift for computing the floor function in constant time, which we do 
not allow in the real RAM model. 

2 The DN-Clustering Spanner Algorithm 

We first describe the cluster-based spanner algorithm by Das and Narasimhan 
It can be roughly described as follows. The algorithm starts with an empty span- 
ner G'. A preprocessing step helps to eliminate all but a linear number of edges. 
Among the edges not eliminated, very short edges (i.e., those of length at most 
D/n, where D is the distance between the farthest pair of points) are simply 
added to G' since their contribution to the overall weight of the spanner can- 
not be more than the weight of a minimum spanning tree, wt(MST). For the 
remaining edges, the greedy algorithm is simulated by sorting the edges, by 
increasing weight, and then processing them in logn phases. Greedy process- 
ing of an edge e = (u,v) entails a shortest path query, i.e., checking whether 
D{g' , wt}(u,v) < t ■ wt{e). If the answer to the query is no, then e is added to 
the graph G', else it is discarded. Whenever shortest path queries are required 
to be answered, these are not solved on the graph G' being constructed. Instead, 
they are solved on a “cluster-graph” H , which is simultaneously maintained. 
The cluster graph H from ^ has the following properties: 

1. distances in H “closely” approximate distances in the current graph G', 

2. every vertex in H has bounded degree, and 

3. “specialized” shortest path queries in H can be answered in 0(1) time. 

The shortest path query when processing edge e = (u, v) is “specialized” in the 
sense that, at the instant that this query is processed, the cluster-graph H only 
has edges whose lengths are within a constant factor of wt{e). For all practical 
purposes, cluster-graph H behaves like an unweighted graph of bounded degree 
for which a bounded radius subgraph around vertex u needs to be searched for 
the presence of vertex v. Since the edges considered have weights in the range 
{D/n,D] and they are processed in logn phases, the edges can be sorted into 
log n bins, where the f-th bin has edges of weight in the range (2®“^-Z)/n, 2®-0/n]. 
In order for shortest path queries to be answered quickly, the cluster-graph 
has to be carefully maintained. At the end of each phase, the cluster-graph is 
recomputed from scratch using the graph G' . This was deemed necessary since, 
in order to answer specialized shortest path queries about edge e={u,v) in 0(1) 
time, all edges in H need to be of length within a constant factor of d{u, v). 

The time complexity analysis is straightforward. Preprocessing steps ran in 
O(nlogn) time. The 0(n) shortest path queries were processed in 0(n) time, 
since each query took only 0(1) time. The cluster graph computation at the 
start of each phase took O(nlogn) time (using Dijkstra’s algorithm on linear- 
sized graphs) . Since there were log n phases, the cluster-graph computations took 
a total of 0(n log^ n) time. The crucial observation made in Pj was that shortest 
path queries need not be answered precisely. Instead, approximate shortest path 
queries suffice to produce low-weight spanners. The second observation was that 
shortest path queries are expensive if the shortest path involves a number of 
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small length edges, and that clustering can help eliminate all small length edges. 
This, of course, meant that the greedy algorithm, too, was only approximately 
simulated by the algorithm. 



2.1 A Faster Spanner Algorithm 

In this section, we present a simple modification to the DN-clustering algorithm 
to construct sparse t-spanners. This algorithm improves on the time complexity 
of the DN-clustering algorithm and runs in time O( iog°og" ) in the algebraic 
decision tree model of computation. 

First we make the observation that there is wide disparity in the overall 
time spent by the DN-clustering algorithm on shortest path queries (0(n)) and 
the time spent on the cluster-graph computations (O(nlog^n)). In order to 
balance out the two costs, it is necessary to do fewer than O(logn) cluster- 
graph computations, which in turn would make the shortest path queries more 
expensive. Instead of processing the edges in logn phases, we process them in 
log log n “batches” . We use the term batches to distinguish from the word phases 
used by the earlier DN-clustering algorithm. If “clustering” is recomputed after 
processing every batch of edges, since each call to the clustering algorithm takes 
0(n log n) time, the total time for cluster graph computations will be O( iog°og„ )- 
We carefully analyze the cost of the 0(n) shortest path queries and show that 
it can now be answered in a total of O(nlogn) time. In in phase i, edges 
from the i-th bin were processed. These edges had weights in the range (W, 2W], 
where W = 2^~^{D/n). During phase i, the cluster graph H could have edges 
(“inter-cluster” edges) whose weights were in the range (5W,2W{1 -I- 2<5)]. This 
meant that for edge e = {u,v) of weight I G (W,2W], checking whether there 
is path from u to u of length at most t ■ I could be done in 0(1) time. More 
precisely, it was observed in ^ that if there does exist a path from u to u of 
length at most t-l, then the number of edges on this path can be at most ^ (since 
I < 2W). It was further observed that since the vertices of H had a constant 
degree bound (say c), and since there are at most 0 (ct ) vertices that lie ^ edges 
away from vertex u, this shortest path query could be done in 0{c^ log ) time 
(running Dijkstra’s algorithm starting from vertex u suffices). A tighter analysis 
was unnecessary in the DN-Clustering algorithm of ^ since c, t, and 6 were all 
constants); below we show an improved analysis of this cost. 

Recall that our algorithm works in fogj°g " batches. Batch i of our algo- 

_ (i— 1) -log log n ^ 

rithm can be described as follows. For W = 2 ^ [D/n), the edges pro- 

cessed in batch i have weights in the range (W,IF2“t#“]j i.e., they are in 
the range (W, IF(logn)rd]. This implies that, for edge e = (u, u) of weight 
I G (IT, W(logn) J^], we need to check whether there is a path from u to v 
of length at most t-l. During batch i, the cluster graph H could have edges 
(“inter-cluster” edges) whose weights are in the range (<5IT, {l + 25)W{\ogn)^\. 

Thus, if there does exist such a path from u to v, then the number of edges on 

1 

this path can be at most ^ crucial observation we make is that the 
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vertices of the cluster-graph correspond to clusters of radius 6W . These clusters 
may overlap, but their centers can lie in only one cluster. In other words, if these 
clusters are shrunk in half, they do not intersect. Thus the vertices correspond 
to disjoint clusters of radius 5 ■ W/2. Now, it is possible to bound the number 
of vertices within distance at most t ■ I = tW (log n) rs . A simple packing argu- 
ment shows that the number of balls of radius r that can be packed in a ball 
of radius R is bounded by 0{{R/rY), where d is the dimension of the space. 

In our case, the number of balls of radius r = ^ that can be packed in a 

1 

ball of radius R = tW{logn)^ is at most )^). Thus the maximum 

number of vertices (and edges, due to the constant degree) that can be reached 

1 

when performing Dijkstra’s algorithm starting from vertex u is Q(( * )‘^)- 

Since t, d and <5 are constants, Q(( *-(iog»)^ _ 0((logn)i). We conclude 

that Dijkstra’s algorithm for a shortest path query has a time complexity of 

0((logn)i • (log((logn)i/4))) = O(logn). 

The obvious consequence is that all 0(n) shortest path queries can be an- 
swered in 0(n log n) time, and hence, we have proved the following theorem: 



Theorem 2. In the algebraic decision tree model of computation, given a set V 
of n points in d-dimensional space, and any real constant t>l, a t-spanner of 
the complete Euclidean graph can be constructed in ) time such that 

the spanner has 0{n) edges, constant degree and weight 0(1) • wt{MST). The 
constants implicit in the O -notation depend on t and d. 

3 An Improved Spanner Algorithm 

In the rest of the paper, we describe an efficient algorithm to construct sparse 
spanners with a running time of 0(n log n). The running time of O(nlogn) for 
our algorithm is achieved by designing an 0(n)-time algorithm for the clustering 
step, thus executing all the clustering steps in O(nlogn) time. Note that the 
running time is 0(n log n) even if clustering is executed O(logn) times. 

One crucial idea that we employ to speed up the clustering is to replace 
the real- valued weights by integral values. As observed in the shortest path 
queries required by the algorithm need not be answered precisely; approximately 
correct answers suffice. A convenient way to achieve the integralization is to use 
the floor /ceiling function. However, this assumes a more powerful model of com- 
putation. In order to get around this problem, we compute the 0(n) floor/ceiling 
functions needed by using operations allowed under the RAM model. The sec- 
ond crucial component of our algorithm is an implementation of the clustering 
algorithm in 0{n) time assuming small integral weights for the edges. We also 
prove that the integralization introduces only a bounded amount of error, and 
that this error retains the correctness of the other required operations. 

The improved spanner algorithm can be roughly described as follows. It is im- 
portant to note that the skeleton of the algorithm is similar to the DN-clustering 
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algorithm from In particular, this improved algorithm also runs in O(logn) 
phases. The algorithm starts with an empty graph G' and employs the same 
preprocessing step to eliminate all but a linear number of edges. This step is 
done by a call to the t-spanner algorithm presented by Arya et. al. in with 
as input parameter. Note that this algorithm in Q is correct and runs in 
time 0(ri log n). It also guarantees that the graph has constant degree. As before, 
short edges of length at most D/n are simply added to G'; their contribution 
to the overall weight of the spanner is bounded by wt{M ST) . The greedy algo- 
rithm is now simulated on the remaining edges and the edges are added to the 
graph G'. The edges of the graph have real- valued weights that are equal to the 
Euclidean distance between their endpoints. The edges are sorted by increasing 
weight and then processed in logn phases. Each of the edges also have cor- 
responding integer-valued weights that are sufficiently close approximations of 
the real-valued weights; these integer-valued weights change through the course 
of the algorithm. In order to distinguish between the real- and integer- valued 
weights, we assume that there are two different weight functions defined on the 
edges of G' . For edge e = (u,v), the real-valued weight function wt{e), as men- 
tioned before, is defined as the Euclidean distance d{u, v) between u and v. The 
integer- valued weight function denoted by Iwti{e) is a function of wt{e) and the 
phase number i is maintained by the algorithm as described later. Whenever the 
phase number is clear by the context, we use the simpler notation Iwt{e) instead 
of Iwti{e). Also, unless specified otherwise, we assume that when we refer to the 
weight of an edge, we are referring to the real-valued weight of the edge. At the 
start of each phase, the integer-valued weight function Iwt{e) is recomputed for 
this phase. Then a set of vertices of G' are selected as cluster-centers and a clus- 
ter graph H is constructed from the current spanner graph G' (using the weight 
function Iwt); this graph iJ is a simpler graph than the graph G' and distances 
between vertices in H are reasonably close to distances between the same pair of 
vertices in G'. The differences of this from the one in ^ lies in the fact that the 
cluster-centers have to be selected before the clustering is done and the cluster- 
ing is done with the weight function Iwt. As mentioned before, we improve on 
the time complexity of this clustering step and show how it can be implemented 
to run in 0(n) time. Once the cluster graph H is constructed, the algorithm 
processes the set of edges for that phase. Greedy processing of an edge e = {u, v) 
entails a shortest path query, i.e., checking whether (w, v) <t- wt{e). As 

in Pj, this query is answered in 0(1) time per query by performing an approxi- 
mate shortest path query on the simpler graph H . If the answer to the query is 
yes, then edge e is added to the graph G', else it is discarded. Each of the steps 
is described in more detail in the rest of the paper. 

Since the edges that remain to be considered have weights in the range 
{D/n,D] and they are processed in logn phases, the edges can be sorted into 
log n bins, where the Ath bin has edges of weight in the range (2®“^-Z)/n, 2'^-D/n]. 
At the start of each of the log n phases, the algorithm calls the “clustering” al- 
gorithm, which is required to answer the shortest path queries efficiently. The 
clustering algorithm is described in section 13. '3 1 Later we show that the running 
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time of our algorithm is O(nlogn). Note that processing is done in logn phases. 
If fewer number of phases are used as in the faster spanner algorithm described 
in section EH then the error due to integralization could be too large. Even if 
fewer number of phases can be used, the running time of the overall algorithm 
will remain as O(nlogn), since it is dominated by other steps in the algorithm. 
In particular, the integralization itself has an initial cost of O(nlogn). 

The detailed algorithm is given below in Fig. H 



Algorithm Improved-Greedy(1/, t, t') 

I. Compute a (^t/t')-spanner G = {V, E) using the algorithm from Q 

o ^ ( VE'—t' \/t£^+34\/tt^ + l — {Vt£^+5) ^ 

Z. 0 .- mm 24 ) 

3. sort E; D weight of largest edge in E\ 

4. Wo ~ 0; Wi ~ for i = 1,2,..., log n 

5. h := {Wi, Wi+i] for i = 0, 1, . . . , (logn - 1) 

6. Ei ;= (sorted) edges of E with weights in 7i; E' := Eo\ G' := {V,E'); 

7. Integralize(7?o, 0) 

8. Cl ;= Naive-Centers(G', J • Wi)\ Mi ~ 0 

9. for i ;= 1 to logn do 

10. lNTEGRALIZE(iJi, i) 

II. ReIntegralize(75o U Si U . . . U Ei-i) 

12. H ■— CLUSTER-GRAPH(G',7wt, Ci,r, S) 

13. for each edge e = (u, v) G Ei do 

14. if not Short- Path(77, u, v, y/tt/ ■ d{u, v)) then 

15. E' ■- E' U {e}; G' := {V, E') 

16. Gi+i := Update-Centers( 77, i, Gi, r) 

17. output G' 



Fig. 1. The 0(n log n)-time spanner algorithm 



3.1 Integralization 

As mentioned before, in order to speed up the cluster-graph computation, we re- 
place the real-valued edge weights by integral values. The integralization changes 
in every phase. It is done in such a way that the edge weights and distances en- 
countered in that phase are always in the range [0..A^], where N = c ■ n for 
some constant integer c. The choice of c will dictate the errors introduced in the 
distance computations; this will be discussed later. 

A closer inspection of a phase leads to the following simple observations. At 
the start of phase i, the spanner graph constructed so far has edges of weight 
at most Wi- During phase i, the edges considered for inclusion by the greedy 
algorithm are in the range {Wi,2Wi\. The shortest path queries for an edge of 
length I involves checking whether the distance between a given pair of vertices 
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is at most t-l. Hence the longest paths that need to be dealt with during phase 
i are of weight t ■ 2Wi. The idea is to make the largest distance to correspond 
to the integer c ■ n. To be on the safe side, since there are small errors in the 
distance computations, we set 2t ■ 2Wi to correspond to c • n. Thus, in phase i, 
unit integer length will correspond to real length of Ui = . 

Although a constant-time floor/ceiling function is not used in the algorithm, 
a convenient way to describe the integralization is as follows: Iwti{e) := ■ 

Error Bounds: Assuming the integralization defined above, we observe that 
the function Iwt always involves a “rounding up”. Hence, IwU{e) ■ Ui > wt{e). 
It is also easy to see that in phase i, the error in the length of any single edge 
of the spanner graph is at most Ui. In other words, Iwti{e) ■ Ui — wt{e) < Ui. 
Note that this error is an additive or an absolute error. Since any simple path 
can use at most n — 1 edges, it is also easy to see that the error in the length 
of any simple path of the spanner graph is at most nUi. Another consequence 
is that given two simple paths Pi and P 2 , if Iwt{Pi) = Iwt{P 2 ), then \wt{Pi) — 
wt{P 2 )\ < nUi. It follows that nUi is also a bound on the error that can be 
introduced when running Dijkstra’s single-source-shortest-path algorithm using 
the integral weights instead of the real weights. The following lemma formalizes 
this statement: 

Lemma 1. In phase i, given any e > 0 and given vertices u and v in G' 

such that D{a\wt}{u,v) > W„ ,wt}{u,v) < D^c' ,iwt}{u,v) ■ Ui < (1 -k 

e) ■ D^G',wt}{u,v). 

Proof. We give a sketch of the proof. In phase i, for a path P such that wt{P) > 
Wi, the error in computing its weight is at most nUi. Thus the relative error 
(i.e., the ratio of the error to the weight of the path) is at most nUi/Wi = 
The proof follows by setting e = ^ and using the well-known property of 

Dijkstra’s algorithm that the minimum value in the priority queue is monoton- 
ically non-decreasing. Note that e can be made as small as desired by choosing 
an appropriate value of c. □ 



Corollary 1. For a path P in G' with wt{P) > 5Wi (i.e., Iwt{P) > R), the 
absolute error in computing its weight is at most nUi, and the relative error is 
at most = e, for any e > 0. 



Computing the Integralization. Here we show how to compute the integer 
values of the weights of the edges over all phases in O(nlogn) time without 
using the floor/ceiling function. 

We first observe that the spanner graph has at most 0{n) edges at the start 
of any phase. Consider a specific phase i. In this phase, for a specific edge, since 
its integer value is in the range [O..A^] (where N = c ■ n), it can be computed 
in O(logn) time without the use of the floor/ceiling function by performing a 
binary search on the set of real values j • Ui, for j = 0, . . . , N. We assume that 
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the function lNTEGRALiZE(if, i) performs this operation for each edge in the set 
E in O(logn) time per edge. 

If the above observations are used in a naive fashion for all edges, then the 
cost of integralization is 0{n log n) just for one phase. Since the number of phases 
is not a constant, the integralization would turn out to be too expensive. Our 
algorithm spends O(logn) time for computing the integralization of an edge 
weight over all the phases. The idea is to compute the integral value in O(logn) 
time when the edge is encountered for the first time. Integralizations of an edge 
for subsequent phases is done by calling ReIntegralize, and are computed in 
0(1) time from the integer weights of the edge computed in the previous phase. 
If the integral weight of an edge is I in phase i, then the integral weight of the 
edge in phase t + 1 will be 1/2 if / is even, and (/ + 1) /2 if it is odd. This is correct 
since C/i+i = 2Ui, i.e., the integralization in phase i + 1 is twice as coarse as that 
in phase i. Checking if an integer is odd or even cannot be done in constant time 
in the real RAM model, but can quite easily be accomplished by using 0(n log n) 
preprocessing. One way to accomplish this would be to build a balanced binary 
tree including c • n elements with the values 1 . . . c • n. Every element in the tree 
also contains a pointer to the element in the tree containing the value |"^] . This 
value can be computed in time 0(log n) and searching the tree for the value is 
also done in O(logn). Hence, by using 0(n log n) time preprocessing, the integral 
weight of an edge for the next phase can be computed in constant time. Note 
also that the relative error for an edge with newly computed weight is less than 
Ci+i, hence LemmaQ] still holds. It is clear that ReIntegralize(F) performs 
its operation for each edge in the edge set F in 0(1) time per edge. 

The above explanation proves that the integralization is computed in time 
0(n log n) for all edges over all phases. The integer weights are then used directly 
in the clustering algorithms described below. 

3.2 Clustering the Graph 

First for some definitions. Here we assume that G = (V, E) is an arbitrary 
weighted graph, with weight function w defined on the edges in E. The following 
definition of a cluster is modified from the one in P] to allow for arbitrary weight 
functions. The definition of a cluster-cover is also modified and is defined for a 
given set of cluster-centers. 

Definition 1. Cluster, cluster-center, and radius 

Given a Euclidean graph G = (V,E), a vertex v gV, a radius r, and a weight 
function w defined on the edges in E, Cluster(G, v, r, w) is defined as the set of 
all vertices u such that Dq^^(v,u) < r. The vertex v is called the cluster-center 
of this cluster and r is called the radius of the cluster. 

Definition 2. Cluster-cover 

Given a Euclidean graph G = (V,E), a set G = {vi,V 2 , ■ ■ ■ ,Vm} Q V, a ra- 
dius r, and a weight function w defined on the edges in E, the Cluster- 
Cover(G, C, r, w) is defined as a set of clusters K = {ATi, Al2, . . . , Aim} such 
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that Ki is a cluster with radius r and cluster- center at vi (i = 1,2, . . . ,m), and 
such that Ki U K 2 U • • • U = V ■ 

During the course of the algorithm, clustering is performed on the spanner 
graph G' with the weight function Iwt. Also, the set C and the value r will be 
chosen in such a way that the cluster-cover always exists. In general, clusters in 
a cluster-cover may overlap. We also modify the definition of a cluster-graph so 
that it is a bit more general and it is defined for a given set of clusters and for 
an arbitrary weight function. 

Definition 3. Cluster graph 

Assume that G = {V,E) is a Euclidean graph with a weight function w defined 
on its edges. Assume that C = {vi,V 2 , ■ ■ ■ ,Vm} Q V is a given set of cluster- 
centers. For a given radius r, we assume that 1C = {K\, K 2 , . . . , Km} is equal to 
Cluster-cover(G, C, r, w). Given R> r, the Cluster-graph(G, w, C, r, i?) 
is defined as a graph FI = (V, Eh) with a weight function w defined on its edges 
Eh. The weight of an edge [m,?;] in Eh is defined to be equal to D[c .,^y{u,v) 
The edges of H are defined as follows. 

Intra-cluster edges: For all Ki, and for all u G Ki, [u,Vi\ G Eh. 

Inter-cluster edges: For all Vi,Vj G G, [vi,Vj\ is an inter-cluster edge 
if either: 

1. Vi ^ Kj and Vj ^ Ki and D^Q ,„-^{vi,Vj) < R (Type 1), OR 

2. there exists e = {ui, Uj) G E such that Ui G Ki, uj G Kj (Type 2). 

Computing the Cluster Graph. Here we describe how the cluster graph is 
computed efficiently under some assumptions. We first describe how a cluster- 
cover is computed. Once a cluster-cover is computed we show that the cluster 
graph can be easily computed. Note that the input to the cluster-cover computa- 
tion is a weighted graph G(V, E) with a weight function w defined on its edges, 
a set G C 1/ of cluster-centers and a radius R. We will assume that \V\ = n, 
\E\ = 0(n), the weight function w is integral, and R is an integer. Since we 
do not have to deal with distances greater than R, we can safely assume that 
the weight of any edge is an integer value in the range [0 .. i?]. We will further 
assume that the cluster-centers are chosen in such a way that a cluster-cover 
exists, which will be shown in Section 3.3. 

The obvious way to implement this algorithm is as was done in P], i.e., to 
run Dijkstra’s SSSP algorithm from all the cluster-centers and to compute the 
clusters in the cluster-cover. However, this has a running time of O(nlogn). 
In order to speed it up, we run Dijkstra’s algorithm in parallel from all the 
cluster-centers and use a simple and faster priority queue, which we denote by 
PQ. The priority queue we use is an array of size R, indexed from 0 to R. This 
is sufficient for our purposes because of the following reasons. Firstly, the weight 
function is integral and the array contains all possible distance values from the 
cluster-centers to vertices in the clusters. Secondly, in Dijkstra’s algorithm, once 
a vertex has been extracted from the priority queue, its distance from the source 
will never be updated again and the distance from the source at the time of the 
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extraction is the correct distance from the source. In other words, the minimum 
value of the items in the priority queue is monotonic. Since the priority queue is 
an array, Extract-Min can be implemented as a scan through the array for the 
“next” largest item. This means that the 0{n) calls to Extract-Min needed 
by Dijkstra’s algorithm can be implemented in 0(n + R) time. 

One problem is that clusters can overlap and that vertices may have entries in 
the priority queue with distances from several cluster-centers. This can be taken 
care of by augmenting the priority queue entries to also store information about 
the vertex as well as the corresponding cluster-center. It should also be noted that 
Dijkstra’s algorithm needs to perform a number of Relax steps and that in each 
such step the priority queue may need to be updated. The process of RELAxing 
an edge {u, v) consists of testing whether we can improve the shortest path to v 
found so far by going through u and, if so, updating the value for v. It should be 
pointed out that this is the only place where we are unable to eliminate the use 
of Random Access since it is critical that this update be performed efficiently, 
i.e., in 0(1) time. Also note that an edge {u^v) may be RELAxed several times, 
each time with respect to a different cluster-center. Thus the time and space 
complexity of the algorithm is affected by the amount of overlap of the clusters 
in the cluster-cover. A careful implementation of cluster-cover can be made to 
run in time 0(m ■ Cy ■ c^ + R), where m is the number of cluster-centers, c„ is 
the maximum number of clusters that contain a vertex, and c^ is the maximum 
number of clusters that contain one of the endpoints of an edge. (In Section 3.3. 
we show that for our purposes Cy and Ce are constants.) 

We now describe how to compute the cluster graph. The input is a weighted 
graph G with a weight function w, a set of cluster-centers C = {rii, . . . , Vm}, and 
two different radii r and R. In order to compute the cluster graph, the algorithm 
computes a cluster-cover from the same set of cluster-centers but with the two 
radii, r and R. Let the cluster-covers with radii r and R be denoted by ICy 
and ICr respectively. We augment the cluster-cover procedure to also produce 
a data structure that supports the following queries for both the cluster-covers: 
(a) FindCenters(l!, /C): Given v & V , it returns all cluster-centers Vi such 

that is in a cluster from /C centered at Vi, i.e., Dajwt{v,Vi) is at most the 
radius of the clusters in /C. it also returns Dc^w{v,Vi) for these cluster-centers, 
and (b) COMPUTEDiSTANCE(vj, r>): Given v G V, and a cluster-center Vj, it 

returns Da,iwt{v,Vi) if DG,iwt{v,Vi) < R; otherwise, it returns the value oo. 

Now the cluster graph H = (V, E^) is computed easily as follows. The intra- 
cluster edges of H are computed by performing FindCenters queries for each 
vertex v G V in the cluster-cover ICy and adding the corresponding edges. The 
inter-cluster edges can be of two types. An edge [vi^Vj] of type I is added if 
Vi ^ Kj and Vj ^ Ki and D^Q jyjty{vi, vj) < R. Note that Ki and Kj are clusters 
of radii r with centers at Vi and Vj respectively. For every cluster-center Vi, we 
use the FindCenters query to list all the clusters from JCr that it is contained 
in. The centers Vj of these clusters satisfy the condition that D^Q y,-^{vi, Vj) < R. 
Now we use the ComputeDistance queries to make sure that Vi ^ Kj and 
Vj ^ Ki. K careful consideration of all the steps above shows that the time 
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complexity of computing the cluster graph is 0{n-c^). Having the cluster centers 
before performing the clustering enables clusters to be grown in “parallel” and 
thus the above algorithm is able to use one common priority queue to grow all 
the clusters, and is consequently able to perform the clustering efficiently. 



Maintaining the Cluster Graph. An edge of type 2 is added if there exists an 
edge e = (ui,Uj) € E such that Ui € Ki and Uj € Kj. During the computation of 
the cluster graph H, only intra-cluster edges and inter-cluster edges of type 1 are 
added. Additional edges may be added during a phase of the greedy algorithm. 
Every time the greedy algorithm decides to add an edge e = (u,v), several 
inter-cluster edges of type 2 may be added to H. This is achieved as follows: for 
every edge e = {ut, Uj) that is to be added to G', perform FindCenters queries 
for Ui and Uj from K-r and join the corresponding cluster-centers by an inter- 
cluster edge in H . The weight of such edges are computed by performing two 
ComputeDistance queries for Ui and Uj with the corresponding cluster-centers 
and adding it to w{ui^ uj). The above function runs in 0(1) time. 



Selecting the Cluster Centers for a Phase. In order for the Cluster- 
graph function to be implemented efficiently, it needs to have the set of cluster- 
centers as input. For the first phase, the cluster-centers Oi are identified in a 
greedy fashion using the weighted graph G' = (V,Eo) with real-valued edge 
weights, and using a radius of r. This is referred to as Naive-Centers in the 
algorithm given in Fig.d Naive-Centers runs in 0{n log n) time, since this can 
be implemented using the standard Dijkstra’s algorithm. For subsequent phases, 
cluster-centers are identified (using UpdateCenters) in a different way. The 
set of cluster-centers are always chosen as a subset of the cluster-centers used 
in the previous phase. At the end of each phase, the algorithm selects a set of 
cluster centers for the next phase. These centers are guaranteed to be sufficiently 
far apart from each other. More specifically, the cluster centers Ci used in phase 
i are guaranteed to be at a distance of at least r/2. 

In phase i, the set of cluster-centers for phase i -I- 1 is computed as Ci+i := 
Ci\Mi, i.e., a subset Mi of the cluster-centers are deleted from the list of cluster- 
centers. We now describe how the set Mi is chosen. M\ is the empty set, implying 
that G 2 is identical to C\. For i > 1, the algorithm picks a cluster-center from Gi 
and deletes all cluster-centers that are within distance r from it. (It is important 
to note that since the integralization changes in every iteration, vertices that 
are distance r' in one iteration are at distance r'/2 in the next iteration.) This 
is easily implemented by using the FindCenters query that is available after 
the cluster-cover for phase i has been computed. The next cluster-center is then 
picked and the process is continued until all centers are either picked or marked. 
Clearly this process runs in time 0{m ■ c„). We now show that in phase i the 
cluster-centers are guaranteed to be at a distance of at least r/2 from each other. 
In phase I, since cluster centers are identified by using a radius of r, all cluster 
centers are at a distance of at least r/2 from each other. In phase i — I, if two 
cluster-centers are at a distance of r or less, then one of them will get marked. 
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and will subsequently be deleted from the list Ci for phase i. Lemma |2I specifies 
conditions under which vertices belong to at most a constant number of clusters. 

Lemma 2. If C={vi,V 2 , ■ ■ ■ ,Vm} C V{G) vertices Vi,Vj€C, D^ajwt}{vi,Vj) > 
r' , and if K,r" = {K\,K2t ■ ■ ,Km\, is returned by Cluster-Cover(G, C, r") 
and if r" < d ■ r' for some constant c' , then each vertex v € V{G) is contained 
in at most a constant (which depends on d and c' ) number of clusters from Kr' ■ 

The conditions of the lemma are true for the cluster graph as constructed 
above with r' = r/2 and c' = 2 or c' = At/5. Hence any vertex in H is part 
of at most a constant number of clusters in K-r or Kr. The proof follows from 
standard packing arguments. Similar arguments also show that the number of 
inter-cluster edges incident on a cluster-center is also a constant (although it 
might have a large number of intra-cluster edges). It follows that the degree of 
any vertex in H that is not a cluster-center must be a constant, and the size 
of H is 0(n). Note that since the weights have been integralized, the resulting 
clusters are approximate clusters; they are a little bit larger (since integers are 
always rounded up) than the exact clusters. 



3.3 Answering Shortest Path Queries 

When the algorithm Improved-Greedy considers an edge e = (u, v) for inclu- 
sion in the spanner graph, it needs to answer a shortest path query. It needs 
to check if D^q/ ,jjfy{u,v) < t ■ d{u,v), where G' is the spanner graph con- 
structed so far. As noted earlier, it is sufficient for this query to be answered 
approximately. So, it is sufficient to devise a procedure to efficiently check if 
D{G',wt}{u,v) < t{l -I- e') • d{u,v), for some small e' > 0. In other words, it 
is sufficient to check if (w, u) < t{l -|- e") • d{u,v)/Ui, for some small 

e" > 0. In fact, the algorithm will check if L?{/r,/u,t}(w, u) < t ■ d{u,v)!Ui. The 
time complexity of this test is a constant if is bounded by some 

constant multiple of r is a constant. Hence, we conclude this section by noting 
that the Improved-Greedy algorithm runs in time 0(n log n). 



3.4 The Graph Produced by Improved-Greedy Is a GSpanner. 

In order to show that a valid cluster graph H approximates the graph G', we 
need to prove lemmas that are analogous to Lemmas 3 and 4 from modified 
to account for the error introduced by the integralization. Next, we need to 
show that the G' is a t-spanner for V. Since the clusters are computed using 
the function Iwt{-) instead of u’t(-), clusters are not as precise as they were 
in The following claims are stated without proof, which will be provided in 
a full version of the paper. Consider the cluster graph H that results from the 
clustering performed on G' at the start of phase i. The following claims apply 
to edges and paths in H . Many of them are modified versions of corresponding 
lemmas in 



Sparse Geometric Spanners 327 



1. Let K, be equal to Cluster(G", u, r, /u>t) (i.e., it is a cluster with cluster- 

center V and radius r = SW) computed in iteration i of the algorithm. If u 
is a vertex in /C, then u) ^ (1 + £)fUi. Otherwise, \i u ^ K, then 

D{G',wt}{v,u) > rll,. 

2. If u is a cluster-center and [m,u] is an intra-cluster edge in the cluster-graph 
H, then D{G\wt}{u,v) < (1 -f 

3. If [u, t;] is an inter-cluster edge, then ri < D^q' , wt}{u,v) < {l+e)-{Ri+2ri)Ui. 

4. If there exists a path Ph in H between vertices u and v such that Iwt{PH) = 
L, then there exists a path Pq' between u and v such that Iwt{PG') < L. 

5. Let H he a, valid Cluster graph of G' with cluster radii r and R = r/6. 
Let Pg' be a path between u and v in G' of weight Iwt(PG') such that 
D{G',wt}{u,v) > (1 + e)W — 25W. Then there exists a path P^ between u 

and V in H such that Iwi^Pn) < • Iwt(PG'). 

The above claims are enough to prove that H is a cluster graph for G', and 
consequently that G' is a t-spanner. As argued in Section 13.21 the resulting 
spanner graph has constant degree. The weight of the spanner is 0{1) ■ wt{M ST) 
because of the leapfrog property from the proof is omitted in this version. 
This concludes the proof of Theorem Q 

4 Conclusions and Acknowledgments 

We present improved algorithms for the sparse spanner problem. In the process, 
we design linear-time algorithms for a clustering problem, which is likely to be 
of independent interest. 

We are grateful to Professor Michiel Smid for helpful discussions and for 
pointing out errors in an earlier draft. 
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Abstract. Let A and B be two convex polytopes in ffi® with m and 
n facets, respectively. The penetration depth of A and B, denoted as 
n{A,B), is the minimum distance by which A has to be translated so 
that A and B do not intersect. We present a randomized algorithm that 
computes tv{A, B) in -|- -|- expected time, 

for any constant e > 0. It also computes a vector t such that ||t|| = 
n{A, B) and int(A -f t) n B = 0. We show that if the Minkowski sum 
B © (—A) has K facets, then the expected running time of our algorithm 
is O for any e > 0. 

We also present an approximation algorithm for computing n{A, B). For 
any <5 > 0, we can compute, in time 0{m + n+ (log^(m + n))/(5), a vector 
t such that ||t|| < (1 + 5)'k(A, B) and int(A + t) n B = 0. Our result also 
gives a ^-approximation algorithm for computing the width of A in time 
0(n + (log^n)/5), which is simpler and slightly faster than the recent 
algorithm by Chan P). 
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1 Introduction 

Let A and B be two convex polytopes in with in and n facets, respectively. 
The penetration depth of A and B, denoted as tt{A,B), is defined as 

tt{A, B) = min{||t|| | int(A + t) fl i? = 0, t G M^}. 

One of the motivations for this problem comes from the field of robotics. Con- 
sider, for instance, the problem of collision detection in robot motion planning, 
where distance between objects is measured in the Euclidean metric. Numerous 
efficient algorithms are known for computing the minimum distance between 
two polyhedra in two and three dimensions (see EE!). Whenever two objects 
intersect, this distance measure is zero. Thus, it fails to provide any information 
about the extent of penetration. The penetration depth is a useful and natural 
measure of this extent. In addition, penetration depth can be a useful quantity 
to have available during physical simulations. Such simulations sample a moving 
system during discrete time steps and detect collisions between objects using 
a variety of methods. When a collision is detected, a penetration has usually 
occurred, because of the discrete time sampling. The penetration depth of the 
colliding bodies can be very useful in computing how to roll the simulation back 
to the instant of first contact, and in estimating the impulse force required for 
the appropriate collision response. 

The problem is closely related to that of computing the width of a convex 
polytope A. Recall that the width of A is the shortest distance between any 
pair of parallel planes that support A. We will note below that if A — B then 
7t(A, A) = width(y4). Thus the penetration depth is a natural extension of width. 
The best algorithm known for computing the width is by Agarwal and Sharir 
P); it is a randomized algorithm that runs in expected time, for any 

constant e > 0. This algorithm is based on a randomized algorithm, presented 
in P], for computing the closest bichromatic pair of lines for two vertically- 
separated sets L and L' of lines in in expected time -|- 

for any e > 0. We use this algorithm for computing 7t(A, B) in 
expected time -I- -I- for any e > 0. Actually, we will 

show that if the number of facets of the Minkowski sum B © (—A) is K, then the 
expected running time of the algorithm is + + for 

any e > 0. This is, to the best of our knowledge, the first subquadratic algorithm 
for computing tt{A,B). 

Dobkin et al. m showed that A and B can be preprocessed in 0{m+n) time 
so that, for a direction u, the distance by which A has to be translated in direction 
u to separate it from B, denoted as A(u), can be computed in 0(log^(m + 
n)) time. We use this result to obtain a simple approximation algorithm for 
computing 7 t(A, B). In particular, for any given ^ > 0, we present an 0(m + n + 
(log^(m + n))/J)-time algorithm for computing a vector u such that int(A + u) fl 
B — % and ||m|| < (1 + 5)tt{A, B). 

Our results imply an “output-sensitive” algorithm or computing the width of 
a convex polytope A with n facets in randomized expected time + 
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where K is the number of facets in A©(— A), and a (l + <5)-approximation 
algorithm for computing the width of A in time 0{n + {log^ n)/S). This ap- 
proximation algorithm is simpler and slightly faster than the recent algorithm 
by Chan which computes a (1 -I- <5)-approximation of width(A) in time 
0{n + (log'^n)/5) for some constant c > 2. Finally, we show how the Dobkin- 
Kirkpatrick hierarchical representations of two convex polytopes A and B can 
be used to obtain efficient implementation of various extremal queries concern- 
ing the Minkowski sum B © (—A) without its explicit construction. For lack of 
space, we omit this part from the current abstract. 

2 Computing the Penetration Depth 

Before describing the algorithm for computing 7 t(A, B), we note the relationship 
between the penetration depth of two polytopes and the width of a polytope. 

Proposition 1 For any convex polytope P in width(P) = tt{P,P). 

Proof. Let A denote the length of the shortest translation vector that separates 
two initially-identical copies of P. Let v be the vector realizing the width of 
P; that is, z; is a shortest vector between two parallel supporting planes of P 
that realize the width of P. Clearly, int(P + u) fl int(P) = 0, and therefore 
A <11 n 11= width(P). As for the other direction, let w be a shortest separating 
translation vector. Clearly, P and P + u touch each other but have disjoint 
interiors. Thus, there is a plane H that separates the interiors of P and P + u, 
and intersects both P and P + u. In particular, P lies between the two planes H 
and H — u. Since the distance between H and H — u is at most || u ||, it follows 
that 



width(P) < d{H, H — u) <|| u ||= A. 

This proposition suggests that we attempt to modify the width-algorithm by 
Agarwal and Sharir |2| to compute 7t(A, B), which is indeed what we proceed to 
do. Conversely, we will also specialize the new techniques developed in this paper 
to obtain new approximation and output-sensitive algorithms for computing 
width(A). 

Overall algorithm. Let A and B be two convex polytopes as defined above. 
Using linear programming, we can determine in 0{m + n) time whether A and 
B intersect [3|. If A and B do not intersect, then we set 7 t(A, B) = Q and stop. 
So we assume that AC\ B ^ 

We can formulate the problem of computing 7t(A, B) in terms of the config- 
uration space that represents all possible placements of A relative to (the fixed) 
B. That is, A turns into a point p{A) and B turns into the Minkowski sum 
B © (—A) ={x — y\xGB,yG A}. Let us assume that the initial location of 
the point p(A) corresponding to A in the configuration space is the origin O of 
the coordinate system. Note that p(A) is inside the polytope P = B ® (—A) if 
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and only if A (in the corresponding translated placement) and B intersect. By 
construction, it follows that 

'k{A, B) = min{d(0, x) \ x € dP}. 

Let X G dV be a point on the boundary so that d{x,0) = d{0,P). Then Ox 
is orthogonal to the facet of P containing x, because otherwise we could obtain 
an even shorter distance from O to dP, which is impossible (alternatively, the 
penetration distance is the smallest radius of a ball B^, centered at the origin, 
that touches the boundary of P). Therefore, d{x,0) is attained as a shortest 
distance between O and a plane that contains the corresponding facet of P. In 
particular, we can compute the penetration distance by computing the distance 
between the origin and all those planes. 

Every facet of P is attained as a Minkowski sum of the form g© (— /), where 
g is a facet, edge, or vertex of B and / is, respectively, a vertex, edge, or facet 
of A. It is well known that there are only 0(m + n) facets of P for which g is 
a facet or a vertex of B (and / is a vertex or a facet of A), and they can all 
be found in 0{{m + n)log(m + n)) time (see e.g. 0). Hence, determining the 
minimum distance from O to all these facets can be done in near-linear time. 
The problem is to handle facets that are attained as Minkowski sums of pairs 
of edges of the form e © (— e') such that e is an edge of B and e' is an edge of 
A. In the worst case, there can be Q{mn) such facets. However, not every such 
pair necessarily generate a facet of P. 

We partition the edges of A and B into a family of pairs of subsets of edges 

= {(^1; Bi), . . . , (H„, Bu)} 

such that the following five conditions hold. 

(Cl) Ai (resp. Bi) is a subset of the edges of A (resp. B). 

(C2) Every pair (e', e) G Ai x Bi generates a facet of P. 

(C3) Every pair of edges that generate a facet of P appears in some Aj x Bj. 
(C4) For each i, the lines supporting the edges in Ai and those in Bi are vertically 
separated. That is, either all lines supporting the edges of Ai lie above all 
lines supporting the edges of Bi, or all of them lie below the lines supporting 
the edges of Bi. 

(C5) T can be partitioned into two subfamilies and such that 

(i) for every 0 < i < [log 2 mJ, there are 0((m/2*) logm) pairs (Ai,Bi) in 

for which 2* < \Ai\ < 2®+^. Let pf denote the subset of these pairs. 
Then \Bj\ = O(nlogn); and 

(ii) for every 0 < i < [log 2 nj , there are 0((n/2®) log n) pairs {Ai, Bi) in P^ 

for which 2* < \Bi\ < 2*+^. Let pf denote the subset of these pairs. 
Then \^j\ = O(mlogm). 

Suppose we have such a decomposition at our disposal. Then we can compute 
tt(A, B) as follows. 

Algorithm: Penetration-Depth {A, B) 
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1. For each pair (/, g) such that / is a vertex or facet of A and 5 is a facet or 
vertex of B, and g © (— /) is a facet of V, compute the distance from the 
origin to the plane containing g © (— /). Let A* be the minimum of these 
distances. 

2. For each pair (A^, Bi) in the above decomposition, find the minimum distance 
Ai from the origin to an element in the set of planes 

Hi = {aff(e © (-e')) | e G e' G AJ, 

where aff(e © (— e')) is the plane containing the facet of i? © {—A) induced 
by e and e'. 

3. Return min{Z\*, mini {Z\i}}. 

The correctness of this algorithm is obvious. Step 1 considers all facets of V 
induced by a vertex-facet pair of A and B. By Condition (C2), the algorithm 
considers only those pairs of edges that generate facets of P, and by Condi- 
tion (C3), the algorithm considers all such pairs. It thus suffices to show how to 
compute Ai, for each pair {Ai, Bi), and how to construct the family T . 



Computing Ai. Let (Ai,Bi) be a pair in T. Denote by Li and L{, respectively, 
the sets of lines that contain the edges of Bi and Ai. 

Lemma 2. For any pair {Ai, Bi) G P, Ai = d{Li, L{). 

Proof. Let e G Bi and e' G A^, and let £ and £' be the lines that contain e and 
e', respectively. Consider the the plane h = £® {—£') = {x — y \ x G £,y G £'}. 
Note that for any two sets X and Y , 

d{0,X® (-L)) = inf{||a: - y|| \ x G X,y gY} = d{X,Y). 

Therefore d{0, h) = d{0,£ © {—£')) = d{l,i'). Thus, 

Ai = min d{0, h) 

heHi 

= mm{d{0,£® {-£')) \ £gL„£' G L{} 

= TAm{d{£,£') \ £ G Li,£' G L-} 

= d{L„L{). 

By the above lemma, computing Ai reduces to computing a closest bichro- 
matic pair of lines in Li x L'. Recall that by Condition (C4) on P, the lines 
in Li and L{ are vertically separated. Agarwal and Sharir 0 showed that under 
this condition, the closest pair in Li x L{ can be computed in expected time 
0{\Li\^/‘^+^\L{\^/'^+^ + \Li\^+^ + Summing this bound over all pairs in 

P and using property (C5), we can prove that the total time spent in computing 
all AiS is 0{m ^ / for any e > 0. 
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Computing T . Our decomposition is based on the following observation. Let M. 
denote the Gaussian diagram (or normal diagram) oi B. M is a, spherical map 
on the unit sphere S^. The vertices of Ai are points on each representing 
the direction of the outward normal of a facet of B, the edges of M. are great 
circular arcs, each being the locus of the outward normal directions of all planes 
supporting B at some fixed edge, and the faces of M are regions, each being the 
locus of outward normal directions of all planes supporting S at a vertex. Ai 
can be computed in linear time from B. Let Ai' be the similarly-defined normal 
diagram of — Consider the superposition of Ai and Ai' . Each intersection 
point between an arc of Ai and an arc of representing respectively an edge 
e of B and an edge e' of A, gives us a direction u which is orthogonal to the 
plane containing the Minkowski sum e © (— e^). Furthermore, e © (— e^) is a real 
facet of B © (—A). It follows that a pair of edges of A and B generates a face 
of B © {—A) if and only if the corresponding arcs intersect in the overlapped 
diagram. Note that the number of such arc intersections on this diagram can be 
n(nm). 

Our goal is thus to decompose all pairs of intersecting arcs of Ai and Ai' . 
Without loss of generality, assume that no intersection point of Ai and Ai' lies 
on the equator. (We can either handle these intersections separately, or perform 
a random simultaneous rotation on Ai and Ai' .) If an arc of Ai or Ai' crosses the 
equator, we split it into two by adding a vertex on the arc at the equator. Hence 
each arc lies completely in the upper or the lower hemisphere. Let HI denote the 
upper hemisphere of We will describe how we decompose the set of edges 
of A and B whose corresponding arcs intersect in H; the lower hemisphere is 
handled similarly. 

Note that the arcs in Ai (and in Ai') are pairwise disjoint. We centrally 
project the arcs of Ai and A4' that lie in H onto the plane h \ z = 1. Since each 
arc of Ai and Ai' is a portion of a great circle, it projects to a segment (or a 
ray) on h. Let E (resp. E') be the set of projected segments of arcs in Ai (resp. 
Ai'). By construction, the interiors of the segments in E (or E') are pairwise 
disjoint. 

As described in 0, we decompose the set of intersecting pairs of segments in 
E and E' into a family T' = {(Bi,B(), . . . , (B„,B^)} as follows. We construct 
two segment trees Ta and Tb on the segments of E and B', respectively. Each 
node V of Ta (resp. Tb) corresponds to a vertical strip, with an associated subset 
Ey Q E (resp. B(, C E') that completely cross the strip. For each such subset, 
we construct a balanced binary tree, sorted by the height of those segments 
inside the strip (the segments are nonintersecting, and thus the ordering is well 
defined) . For each node w of this binary tree, we refer to the subset of segments 
stored in the subtree rooted at w as a canonical subset. 

For each segment e of E' (resp. B), we find the nodes v of Ta (resp. Tb) such 
that at least one endpoint of e lies inside the strip associated with the parent 
of V] there is a logarithmic number of such nodes. We report all segments of B„ 
(resp. B„) intersected by the segment as the union of a logarithmic number of 
canonical subsets. After repeating this step for all segments, for each canonical 
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subset Ew of Ta, we report the pair E'^), where E'^ is the subset of segments 
for which the query procedure returned E^ as one of the canonical subsets. We 
do the same for the canonical subsets of Tb- It is shown in 0 that if segments 
e G E,e' G E' intersect, then there is one such pair {Ez,E'^) such that e G E^ 
and e' G E'^., and that the total time spent is 0((m + n)log{m + n)). Finally, 
for each pair {E^nE'^), let (resp. B^) be the set of corresponding edges of 
A and B. We add the pair {A^,Byj) to T. (resp. T^') is the subset of pairs 
corresponding to the canonical subsets of Ta (resp. Tg). The argument in ^ 
shows that T satisfies conditions (C1)-(C3) and (C5). Condition (C4) follows 
from the following lemma. 

Lemma 3. Let e he an edge of B and e' an edge of A such that the corresponding 
arcs intersect in H. Then the line supporting e lies above the line supporting e' . 

Proof. Since the arcs corresponding to e and e' intersect in H, the sum e0 (— e') 
is a facet of i? 0 (—A) with an outward normal direction u that points upwards. 
By construction of the diagrams, there are planes h, h' orthogonal to u and 
supporting, respectively, B at e and A at e' . Moreover, relative to the direction 
It, B lies below h and A lies above h' . It follows that since A and B intersect, 
the plane h is above the plane E relative to the direction u. Thus also the line 
£ containing e is above the line I' containing e' relative to the direction u. We 
assume that the of vertices of A and B are in general position. In particular, 
there is no four vertices lying in the same plane. Thus the lines i and I' are not 
parallel. Let be the unique upward-directed vertical line that passes through 
I and £' . Since the angle between £q and u is smaller that tt/ 2, and a line in 
direction u crosses E before h, it follows that Iq also crosses E (at a point on £') 
before it crosses h (at a point on £). Hence £ lies vertically above £' ^ as claimed. 

Hence, we conclude the following. 

Theorem 1. Given two convex polytopes A, B in with m and n vertices, 
respectively, the penetration depth of A and B can be computed in (randomized 
expected) time _l_ rn}+^ for any e > 0; the constant of 

proportionality depends on e. 

An output-sensitive bound. Let K denote the number of facets inV = B(B[—A). 
We derive a bound on the expected running time of the algorithm that depends 
on K. Note that the pair (Ai, Bi) contributes \Ai\ - \Bi\ facets to V. The expected 
running time of the algorithm is 

U 

Y, O ((|H,||H,|)3/4+- 0 0 

= o{K^Y^\A,\\B,\f/^ + m^+^ + n^+^ 

V i=i 

We obtain a bound on ^ similar argument bounds 

the quantity for pairs in . Let Ki be the number of facets contributed by the 
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pairs in Recall that = 0(m/2* log to). 
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where the second last inequality follow from the fact that 

K.= Y 1^411^.1 >2* E l^4l- 



On the other hand, Ki < — c2*nlogn for a constant 

c > 1. The term is therefore maximized when Ki = c2*nlogn for 

0 < i < log 2 ^nf^gn and 0 otherwise. Hence, 

log 2 »n /^3/4\ *°S 2 cn log n 

E O ^ = E 0(T/^(nlognf/^) = 0{VK{nlognY^^). 

z=0 V ^ / z=0 



Therefore E (|Hj| • \Bj\)^^'^ = O ('/K (mn log which implies the 
following. 

Theorem 2. Given two intersecting convex polytopes A, B in with m and 
n vertices, respectively, such that B © (— H) has K facets, one can compute the 
penetration depth of A and B in (randomized expected) time O 
+TO^“*'‘^ + for any e > 0. 

An immediate corollary of the above theorem is the following. 

Corollary 4 Given a convex polytope A in with n vertices such that A© (—A) 
has K facets, one can compute the width of A in ( randomized expected ) time 
O ^/n + for any e > 0. 



3 An Approximation Algorithm 

We now present an efficient algorithm for approximating the penetration depth 
of A and B. That is, for a given <5 > 0, the algorithm computes a translation 
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vector t such that A + t and B are disjoint and ||t|| < (1 + S)tt{A,B). The 
algorithm is as follows. 

Algorithm: Approx-Separation (A, B) 



1. Define on the unit sphere of directions a grid G of points in the following 



manner: Divide the interval of angles [0, tt] into 
length, delimited by the points 0 = po,pi, - ■ ■ 



Cl / Vs subintervals of equal 

^P\ci/Vs} = where ci is a 
constant independent of S which will be specified later. Then the grid G is 
defined as the set of points 



G = {{p^,2pJ) I 0 < i,j < Tci/v/jl}, 



where the points are given in spherical coordinates {<p,0). 

2. For each point p G G apply the following procedure: Perform in the config- 
uration space a ray-shooting query from the origin in the direction Op. Let 
A{p) be the Euclidean distance from O to the boundary of S 0 (—A) in this 
direction. (We will explain below how the ray-shooting can be performed 
efficiently without explicit computation of the Minkowski sum.) 

3. Output A = minpgG{L\(p)} as an approximate solution. 



Lemma 5. For any <5 > 0, algorithm Approx-Separation computes correctly 
a translation of length A that separates A and B, such that A < (1 + (5)7t(A, B). 

Proof. See ^ Section 3]. 

The size of the grid built by the Approx-Separation algorithm is 0(1/^). 
It was shown by Dobkin et al. m, that after a linear-time preprocessing of A 
and B into suitable data structures, the shortest separation of A and B along 
any query direction u can be computed in time 0(log^(m + n)). This operation is 
equivalent to performing a ray shooting in the direction u from the origin towards 
d{V). Therefore, the total running time of the algorithm Approx-Separation 
is 0{m + n + (log^(77i + n))/5). We have thus shown: 

Theorem 3. Given two convex polytopes A and B in with m and n facets, 
respectively, and a parameter i5 > 0, one can compute, in time 0{m + n + 
(log^(m + n))/5), a separating translation for A and B whose length is at most 
(1 + 5)7r(A, B). 

Applying Proposition d we also obtain the following corollary: 

Corollary 6 For any S > 0, a (1 + S)- approximation of the width of a convex 
polytope in with n facets can be computed in time 0{n+ {log^n)/S). 
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Penetration depth under polyhedral metric. Another application of this technique 
is to obtain a linear-time algorithm for computing the penetration depth of of 
A and B under any polyhedral norm (whose unit ball is a polytope with 0(1) 
facets, such as the or L°° norms). Let Q be a centrally-symmetric convex 
polytope with 0(1) facets, and let || ■ ||q denote the norm induced by Q. We 
observe that the || • ||g-distance from O to the boundary of P is equal to the 
largest scaling factor A such that \Q C V. As is easily seen, a vertex of XQ must 
then touch d{V). Moreover, when A varies, each vertex of XQ traces a ray from 
the origin. Hence, to find the largest A, we perform ray shootings from O in each 
of the 0(1) directions of the vertices of Q. For each of these ray shootings we 
compute the scaling A that corresponds to the hitting point of that ray with dV. 
The smallest of these A’s is the desired || • ||g-length of the shortest separating 
translation. We have thus shown: 

Corollary 7 The shortest separating translation of two convex polytopes A and 
B in with m and n facets, respectively, under any polyhedral norm (whose 
unit ball has 0(1) facets) can he computed in 0{m + n) time. 

Handling shallow penetrations. If the penetration of A into B is relatively small, 
then one might expect that the following combinatorial property holds in prac- 
tice. Let (5 > 0 be a small parameter Then the number of facets oiV = H© (— A) 
whose distance from the origin is at most (1 + S)'x{A, B) is small, say Kg. If this 
is the case, then the following more efficient algorithm computes 7t(A, B). 

Algorithm: Shallow-Penetration (A, B) 

1. Construct the grid G as in Algorithm Approx-Separation. 

2. Using Algorithm Approx-Separation, compute a real value A such that 
Z\ < (1 + 5/4)7t(A,H). 

3. Compute G' = {u G G \ A{u) < (1 + 5/ A) A}. 

4. Let B be the ball of radius (1 + bjT)A < (1 + 5)tt{A,B) centered at the 
origin. 

5. For each u G G' , do the following: 

(i) Compute the face f of P supported by the plane orthogonal to u. 

(ii) By performing an implicit breadth-first search on dP, compute the con- 
nected component G„ of (dP) fl B that contains /. (If G„ = G„ for two 
directions u ^ v, we compute the connected component Gu only once.) 

(iii) Compute = min/gCu d{0, /), where / is a facet of P in Gu. 

6. Return min^gG/ Z\„. 



Steps (l)-(3) can be performed in 0(m + n + (log^(m + n))/S) time as de- 
scribed in the algorithm Approx-Separation. For a given u G G' , we can 
compute Gu in 0((1 + |G„|) log(m + n)) time by locating u in At and M' and 
by traversing the two normal diagrams simultaneously. We omit the easy details 
from this abstract. Computing Z\„ takes 0(|G„|) time. Since we traverse each 
connected component of {dP) C\B most once, the total time spent in Step (5) 
is 0{{Ks + l/(5)log(m + n)), where Ks is the number of facets of P that lies 




338 P.K. Agarwal et al. 



within distance (1 + 5)'k{A^ B) from O. The same argument as in Lemmal^can 
be used to show that the above algorithm computes all those connected compo- 
nents of (dV) n B that contain a facet within distance 7t(A, B) from O. Hence, 
min„gG' = tt{A,B). We thus obtain the following. 

Theorem 4. Given two convex poly topes A and B in with m and n facets, 
respectively, and a parameter i5 > 0, one can compute 7t(H, B) in time 0{m+n+ 
Ks\og{m + n) + {[o^{m + n))/5), where Kg is the number of facets of B®{—A) 
within distance (1 -I- 5)'k{A, B) from the origin. 
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Abstract. We describe a kinetic data structure for maintaining a com- 
pact Voronoi-like diagram of convex polygons moving around in the 
plane. We use a compact diagram for the polygons, dual to the Voronoi, 
first presented in [MKS96j . A key feature of this diagram is that its size 
is only a function of the number of polygons and not of their complex- 
ity. We demonstrate a local certifying property of that diagram, akin to 
that of Delaunay triangulations of points. We then obtain a method for 
maintaining this diagram that is output-sensitive and costs O(logn) per 
update. Furthermore, we show that for a set of k polygons with a total 
of n vertices moving along bounded degree algebraic motions, this dual 
diagram, and thus their compact Voronoi diagram, changes combinatori- 
ally n{n^) and 0(fcn^/3(fc)/3(n)) times, where (3{-) is an extremely slowly 
growing function. This compact Voronoi diagram can be used for collision 
detection or retraction motion planning among the moving polygons. 



1 Introduction 

Voronoi diagrams are fundamental data structures in computational geometry 
and have been used in a wide variety of applications that require proximity infor- 
mation among geometric objects. When a Voronoi diagram is defined on objects 
that are not points, all features of these objects can contribute to the diagram’s 
complexity. In this paper we will be concerned with the Voronoi diagram of k 
convex polygons in the plane with a total of n vertices. Even though such a 
diagram defines only k regions (one per object), its total geometric complexity 
is 0{n) — as all polygon vertices can contribute to the linear or parabolic bi- 
sector segments defining the edges separating these regions. Many of the queries 
we may want to use such a diagram for, however, (such as reporting the object 
closest to a query point, or the closest pair among the given objects) refer only 
to the objects themselves and not their individual features. In 1996, McAllister, 
Kirkpatrick, and Snoeyink pVTKS96j showed how to compute what they called 
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a compact Voronoi diagram, which is a simplified partition of the plane of size 
0{k), but which can still be used to answer proximity queries (such as the above) 
about the objects efficiently. 

Voronoi-based methods have been successfully used to address proximity 
queries in robotics applications, such as collision detection \nrm\ or retraction 
motion planning iHEm. In the robotics setting the convex polygons represent 
obstacles to be avoided. When these obstacles move, we need to update their 
Voronoi diagram accordingly. A natural framework for studying this is the ki- 
netic data structures (KDS) framework, introduced by Basch, Guibas, and Her- 
shberger in [BGH97] . In the KDS setting one maintains a geometric structure 
under continuous motion of its defining elements through a set of certificates 
proving its correctness. An event queue is maintained for the failure times of 
these certificates and at each event the structure of interest, and its kinetic 
proof, are appropriately updated. It turns out that maintaining kinetically the 
Voronoi diagram of moving points is easy wm, as a set of local conditions 
(inCircle tests) certify the global correctness of the structure and local repairs 
are always possible. 

In this paper we study the kinetic maintenance of the compact Voronoi di- 
agram for disjoint moving convex polygons in the plane. Though this diagram 
contains much less detailed information than the full Voronoi diagram, it turns 
out that it still has enough structure that it can be certified through a set of 
topologically local certificates and thus maintained as the objects move. This 
kinetic diagram can, for example, be used to track the closest pair among the 
moving objects and therefore perform collision detection. Many collision detec- 
tion algorithms for convex bodies rely on determining the closest pair of features 
between two objects as the basic building block |LC91IMir97j . To avoid consid- 
ering all ( 2 ) pairs of objects, these algorithms invoke a so-called ‘broad-phase’ 
method to select which pairs of objects to test — usually an intersection test on 
bounding boxes for the objects. The compact Voronoi diagram elegantly solves 
the broad-phase problem and always provides us with a set of 0{k) pairs of 
objects (those whose regions are adjacent in the diagram) that is guaranteed to 
contain the closest pair. As another example, the diagram can be used to solve 
the retraction motion planning problem, with the help of an additional structure 
that maintains the closest pair of features between two moving convex chains. 

We introduce the basic notations and definitions we need in Section El A key 
notion is that of junction triangles, which are dual to the degree-3 vertices of the 
Voronoi diagram. If we remove the junction triangles from the free space around 
the obstacles, the rest of the free space can be decomposed into a set of corridors, 
each between two of the convex objects. This structure and the certification of 
its correctness are introduced in Section 01 In Section 0 we study the number 
of changes to the compact Voronoi diagram when the defining polygons move 
pseudo-algebraically in the plane. Using lower envelope and other techniques, we 
can show that the number of changes to the diagram is roughly 0{kn^). Finally 
in Section 0 we give applications to the collision detection and retraction motion 
planning. 
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2 Preliminaries 

A distance function i 5 defined on points in can be generalized to points sets 
Si and S2 by setting S(Si,S2) = infsiGSi,s2GS2 ^(■Si,S2); when Si contains only 
one point s, we can simply write < 5 (s, S2)- If >S'i, S2 are bounded closed sets, their 
distance can be realized by a pair of points (si, S2) where si S Si and S2 G S’2- 
In what follows we will use S to denote the usual Euclidean distance. 

Consider now a set V of disjoint convex obstacles in the plane. Under the 
above distance function between points and points sets, we can define a Voronoi 
diagram for V, called the generalized Voronoi diagram of V, which is the partition 
of the free space in the plane according to the nearest obstacle. We assume that 
P is a set of k disjoint convex polygons with n vertices in total. The generalized 
Voronoi diagram of V, denoted by V{V), has complexity 0 {n) and can be built 
in time 0 (n log n). In [IVI KSDHj . a compact representation of V{V) is presented. 
The proposed compact Voronoi diagram has size 0 {k) and can be computed 
in O(fclogn) time, assuming each object is represented by the sorted list of its 
vertices in clockwise (or counterclockwise) order. Despite its compactness, this 
new diagram is as powerful as the generalized Voronoi diagram with regards 
to nearest-neighbor and other queries. Here, we will show how to maintain this 
compact Voronoi diagram when the convex obstacles are in motion. 

From this point on, when we refer to “an obstacle,” we mean a closed convex 
polygon. An edge on a polygon is an open line segment; a feature of a polygon is 
a vertex or an (open) edge; the size of a polygon is the number of vertices on it. 
Two features are adjacent if their closures intersect. Three polygons are collinear 
if there is a line tangent to them simultaneously. Four polygons are cocircular if 
there is a circle tangent to them simultaneously, where a line is tangent to an 
object if it intersects the object only at its boundary, and a circle is tangent to an 
object if it intersects the object only at its boundary and its interior is disjoint 
from the object. Normally in computational geometry we assume that objects 
are in general position, and specifically in our setting that no two polygon edges 
are parallel, no three objects are collinear, and no four objects are cocircular. 
However, when objects can move, it is no longer legitimate to make the above 
assumption. Actually, interesting events happen exactly at the time when such 
a degeneracy occurs. For moving objects, by general position we mean that the 
above events happen at distinct discrete times (never two at once). 

For any point p outside a polygon P, there is a unique point q on P realizing 
the distance 6 {p,P). Equivalently, there is a unique circle that is centered at p 
and tangent to P. Let us denote this circle by ui{p,P). The radius of uj{p,P) 
clearly equals 6 {p,P). The unique feature (edge or vertex) that contains q is 
called the closest feature to p. For two disjoint convex polygons P and Q, if they 
do not have parallel edges, there is a unique pair of points (p, q), where p G P 
and q G Q, that realizes 6 {P, Q). We denote by o(P, Q) the middle point of the 
line segment pq. At the point o{P,Q), we can place a minimum circle that is 
tangent to both P and Q. 

For the polygon set P, denote the convex hull of the vertices of V by C(P). 
Let P(P) = C{V)\V denote the free space outside the polygons but inside C(P). 
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For polygons P\ and P2 in V and points p G P\ and q € P2, the edge pq is called 
a free edge if it does not intersect the interior of any polygon in P. Two non- 
intersecting free edges pq and p'q', where points p,p' G Pi and q, q' G P2, define 
a corridor with respect to V if and only if the region bounded by pq, p'q', and 
the convex chains pp' on P\ and qq' on P2 contain no polygons of P. Sometimes 
we will talk about the corridor of two polygons P\ and P2 , which is the corridor 
with respect to {Pi, P2} defined by their outer common tangents. 

For two disjoint convex polygons Pi and P2, the bisector between them is 
defined to be the locus of points that are equidistant from them. It is well known 
that the bisector is an unbounded Jordan curve that consists of 0(|Pi| + IP2I) 
line segments and parabolic arcs. For presentation convenience, we also add an 
orientation to the bisector. The oriented bisector 7 t(Pi,P 2) (abbreviated 7ri2) 
is the bisector with the orientation so that Pi is to the left of 7 Ti 2. Since the 
bisector is an oriented unbounded Jordan curve, we can define a linear ordering 
^ on the points on 7 Ti 2 . For two points p,q € 7Ti 2 , we say that p ^ g if p is 
encountered before q when traveling on 7 Ti 2 consistently with the orientation 
of 7 Ti 2 (Figure Q] (a)). We can parameterize the bisector 7 Ti 2 as follows: for a 
point p G 7Ti 2, if p P o(Pi,P 2), then Ci2(p) = -(HPiPi) ~ ^o,Pi)); otherwise, 
Ci2(p) = d{p. Pi) — S{o, Pi). When there is no degeneracy, the function (12 is an 
one-to-one and onto mapping from 7 Ti 2 to IR 1 |MKS960 . Clearly, C12 = -(21- 

For three convex polygons Pi, P2 and P3 in general position, it is known that 
the bisectors 7 Ti 2 and 7 Ti 3 intersect at most twice. For an intersection v between 
7ri2 and 7ri3, the circle uj{v,Pi) is tangent to Pi, P2 and P3. We say that v 
is defined by the ordered triplet {Pi,P2,P^) if Pi, P2 and P3 are tangent to 
uj{v,Pi) in counterclockwise direction. Then there is at most one vertex defined 
by (Pi,P2,P3). 

A point p G 7Ti 2 is said to be shaded by P3 if w(p, Pi) fl P3 0. Let S'12,3 
denote the set of the points on 7 Ti 2 shaded by P3. Consider the set of parameter 
values represented by the shaded portion, {Ci2(p) \p G 'S'12,3}, which we denote 
by >Si2,3. Since bisectors 7 Ti 2 and 7 Ti 3 intersect at most twice, the shaded set S'12,3 
must have the form 0, (—00, a], [b, -koo), (—00, a] U [b, -koo), [a, b], or (—00, -koo), 
where a, b correspond to the parameter values of the Voronoi vertices defined by 
Pi, P2 and P3. 

When Si2,3 has the form (—00, a] or (—00, a] U [6, -koo), we say that 7 Ti2 is 
half-shaded by P3 at a. Notice that if neither 7 Ti 2 nor 7T2 i is half-shaded by P3, 
then Si2,3 must have the form 0 or [a, 6] where a < b. The following fact is useful 
later in bounding the number of combinatorial changes. 

Lemma 1. The shaded set Si2,3 is of the form [a, b], where a < b, if and only 
if P3 lies completely inside the corridor between Pi and P2. 

The bisector 7 Ti 2 divides the plane into two regions that contain Pi and P2, 
respectively. Let us denote rp^p^ (or simply T12) the region that contains Pi. Each 
point in ri2 is closer to Pi than to P2. For Pi G P, the Voronoi region V {Pi) of 
Pi is then defined to be rii/i Pj — the set of points that are closer to Pi than 
to any other polygon in P. Each Voronoi region is connected and bounded by 
portions of bisectors. Therefore, the corresponding Voronoi diagram V(P) is a 
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planar map (Figure 0 (b)). Two convex polygons are adjacent in V{'P) if their 
Voronoi regions share a boundary. 

Lemma 2. A point p G 7Ti2 is in V(jP) if and only *fCi 2 (p) is not in the interior 
of Si 2 ,i, for any z yf 1,2. 

In V{V) there are two types of vertices, as illustrated in Figuren(b). Vertices 
of degree three are junction vertices, corresponding to the intersections between 
two bisectors. The remaining degree two vertices are interior vertices, lying along 
a single bisector where the closest feature pair changes. At each junction vertex, 
we can grow a circle that touches three polygons and is free of other polygons. 
This circle is called a witness circle for that junction vertex. 






Fig. 1. (a) The oriented bisector 7 Ti 2 between Pi and p 2 . The pair (p,q) is the closest 
pair of points between Pi and Pj. According to the ordering, v < o < u. (b) The 
generalized Voronoi diagram where the solid vertices are junction vertices and the 
hollow ones are interior vertices. Not all hollow vertices are shown, (c) The compact 
Voronoi diagram with only the junction vertices remaining. A point at oo is added to 
compactify the diagram. 



The junction vertices capture the topology of V[V). Furthermore, while the 
total number of vertices in V(V) can be 0(n), the number of junction vertices is 
at most 2k — 4, where k is the number of polygons in V. The compact Voronoi 
diagram is based on these junction vertices, plus an additional imaginary vertex 
at oo: we form an edge between two vertices if and only if there is a portion of 
a bisector between them with no junction vertex in between (Figure Q (c)). If 
a portion of bisector extends to infinity, we connect the vertex to the node at 
oo. We refer to this diagram as the compact Voronoi diagram of V. In fM KSDBj . 
it is shown that the compact Voronoi diagram has many useful properties and 
that it can be computed in O(fclogn) time. In the following, we study how to 
maintain the compact Voronoi diagram for moving polygons. 
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3 Maintaining the Compact Voronoi Diagram 
for Moving Obstacles 

Maintaining the traditional Voronoi diagram of moving points in the plane, usu- 
ally represented via its dual Delaunay triangulation, is straightforward 
This is because a local condition, the ‘empty circle’ property for triangulation 
edges with pairs of adjacent triangles implies the global correctness of 

the diagram. When one of these conditions fails due to object motion, a local 
transformation (the ‘edge-flip’) is sufficient to restore local, and therefore global, 
correctness. The same principle was exploited to maintain the power diagrams 
of moving balls in ESM]. 

To pursue this idea in our current context, we first define a dual ‘triangula- 
tion’ of the (compact) Voronoi diagram. Recall that Ti(P) denotes the free space 
inside the convex hull of V . A triangle is called a junction triangle if it is incident 
to three objects in V. A corridor is a four-sided portion of the free space delim- 
ited by two polygons on two opposite sides. With slight abuse of notations, we 
call a cell decomposition with triangles and corridors as primitive cells a triangu- 
lation. A triangulation T{V) of P is a cell decomposition of T{V) into junction 
triangles and corridors with all the vertices of 'T on polygon boundaries, that 
satisfy the following properties: 

— All the (junction) triangles and corridors in T are interior disjoint, 

— The vertices of triangles and corridors in T are on polygon boundaries, 

— The triangles in T do not intersect the interior of a,jw polygon P G V, and 

— The union of triangles and corridors in T covers T{V). 

For a given polygon set V, the number of junction triangles in a triangulation 
are specified by the following fact. 

Lemma 3. For k disjoint convex polygons, if their convex hull contains h non- 
polygon edges, then any triangulation of their free space contains exactly 2k— h— 2 
junction triangles. 

We can form a triangulation of T[P) based on the compact Voronoi diagram 
of 7^ as follows. For each junction vertex in the Voronoi diagram, its witness 
circle touches three polygons. We connect the three contact points to form a 
junction triangle corresponding to each junction vertex. It is easy to prove that 
the junction triangles thus formed do not intersect polygon interior and do not 
interpenetrate each other. 

If we remove these junction triangles from T(fP), we will have a set of dis- 
connected regions where each connected component is a corridor between two 
convex polygons. The corridors and junction triangles together form a triangu- 
lation that we will call the Delaunay triangulation of P and denote it by D{V) 
(Figure 0. If a junction triangle A is incident to three features /i, /2 and /s, 
on different polygons, we say that the triplet (/i,/ 2 ,/ 3 ) defines A. When ob- 
jects start to move, the Delaunay triangulation changes combinatorially if such 
a triplet changes. In the following, we will show how to maintain D{P) when the 
objects in V move. We assume that during the motion, all the objects remain 
disjoint. As we will see later, we can use T>{V) to detect collisions. 
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What is crucial for the maintenance of T>{V) 
is its local certification property. For any junc- 
tion triangle Z\, if Z\’s circumcircle (denoted by 
A) does not intersect the interior of any P G V, 
A is called a Delaunay triangle. Clearly, a trian- 
gulation is a Delaunay triangulation if and only 
if all of its junction triangles are Delaunay tri- 
angles. To state the local property we seek, we 
define a local Delaunay condition for a junction 
triangle. A corridor is said to be adjacent to 
a junction triangle if they share an edge. Two 
junction triangles are adjacent if they are ad- 
jacent to the same corridor. Each junction tri- 
angle A then has up to three adjacent corridors and three adjacent triangles 
(Figure 0 (a)). The neighboring polygons for A are defined to be the polygons 
incident to it or to one of its adjacent junction triangles. We say that a junction 
triangle is locally Delaunay if its circumcircle does not intersect any of its neigh- 
boring polygons. Clearly, all the junction triangles of D{V) are locally Delaunay. 



Fig. 2. The Delaunay triangula- 
tion of the compact Voronoi di- 
agram. The free space between 
two adjacent junction triangles 
are corridors. 




(a) 



(b) 



Fig. 3. (a) A junction triangle and its adjacent corridors and polygons. When A is 
locally Delaunay, A is covered by the adjacent corridors and circumcircles of its adjacent 
junction triangles, where the boundary of this region is thickened in the figure, (b) Proof 
of the local property. 



We will show that the other direction also holds, which is the counterpart of 
the local certification property of the Delaunay triangulation for points. 

Lemma 4 (Local Property). If all the junction triangles of a triangulation 
T{V) are locally Delaunay, then T{V) is the Delaunay triangulation ofP. 

Proof. First, we observe that if a triangle A is locally Delaunay, then A is covered 
by the union of A, Z\’s adjacent corridors, and the circumcircles of A’s adjacent 
triangles (Figure 0(a)). 

Using the above fact, we will prove the local property by contradiction. As- 
sume that P is not a Delaunay triangulation — then there must exist a point 
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p interior to one of the polygons in V and a junction triangle A\ G T so that 
p G Ai. For an edge Ci of Ai, denote by e the sector of Ai bounded by the 
chord ei- By the definition of triangulation, Ai does not intersect any polygon 
interior. Therefore, p G Ai\ Ai, i.e., p G ab for a unique edge ah of Ai. Denote 
the angle apb by 0(p,Ai). We note that 0(p,Ai) is the maximum angle p can 
make with the endpoints of edges of Ai. Among all the junction triangles whose 
circumcircles contain p, let Ai be the one with the maximum 0(p, Ai) and let 
Cl be the edge of Ai so that p G ei. 

Now consider the corridor C adjacent to Z\i at ei. Let the other edge bound- 
ing C to be 62 and the junction triangle incident to 62 be Z\2. Since the corridor 
is the union of a set of triangles in T, it cannot contain p. Thus, p G ii\C. This 
implies that ei intersects the edge 62 since Ai does not intersect the polygons 
incident to Ai. By the local Delaunay property of Ai, Ai is covered by the 
union of corridors and the circumcircles of adjacent triangles, i.e., Ci C Z\2 U C, 
or 6i \ C C Z\2- Therefore, p G A2- We can further conclude that p is not in 
62 because ei fl 62 C C. Suppose that the endpoints of 62 are c and d and that 
62 intersects the boundary of Ai at points c' and d'. Then clearly the angle 
0(p, A2) > Zcpd > Zc'pd' > Zapb = 9{p,Ai), contradicting the maximality of 
6»(p, Z\i)(FigureEl(b)). 

By the above local property, to certify that a triangulation is the Delaunay 
triangulation, it is sufficient to certify that all the junction triangles are locally 
Delaunay, i.e., the circumcircle of each junction triangle does not intersect the 
interior of any of its neighboring polygons. Among the (up to six) neighboring 
polygons of a junction triangle A, three are incident to A and the others are 
incident to one of the adjacent junction triangles to A. We consider these two 
cases separately. In the following, we assume that A is defined by the triplet of 
features (/i,/2,/3) where /i,/2,/3 are features on Pi, P2 and P3, respectively. 

To certify that A does not intersect the interior of its incident polygons it 
suffices, by convexity, to certify that A does not intersect any features adjacent 
to fl, /2 or /a. All these certificates can be written as algebraic conditions 
in terms of the coordinates of (constant number of) polygon vertices. When a 
certificate fails, say, when A intersects the feature /(, an adjacent feature of /i, 
we then simply update the triplet of features from (/i, /2, /s) to (/(, /2, /s)- The 
topological structure of VifP) is not affected by such events. 

For those neighboring polygons that are not incident to A, suppose that 
Ai is a junction triangle adjacent to A and it is incident to P\, P2 and P4. 
To certify that A does not intersect P4, it suffices to certify that A does not 
contain the point that is on Z\i on P4 because Z\i is disjoint from the interior 
of Pi, P2 and P4. When such a certificate fails, it must be the case that the 
circumcircles of A and Z\i are coincident. In other words, this happens when 
Pi, P2, P3, P4 are cocircular. We call such events cocircularity events. We also 
note the special case when the junction triangle is on the convex hull. In this case, 
the corresponding event is when the polygons become collinear and the common 
tangent line supports the convex hull of P or vice versa. Such event decreases or 
increases the number of junction triangles by one. For such collinearity events. 
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we can watch each junction triangle with an edge on the boundary of C{V) to 
see when such a triangle degenerates and each pair of adjacent non-polygonal 
convex hull edges to see when such a triangle emerges. In the following, we will 
focus on the cocircularity events. 

For A, A\ to have the same circumcircle, they must share an edge, say ei, 
which is incident to P\ and P 2 — otherwise, we would contradict the fact that 
A and Z\i do not intersect the interior of Pi and P 2 - After such a cocircularity 
event happens, the triangulation is no longer a valid Delaunay triangulation. To 
fix it, we simply delete the edge ei and add the other diagonal of the quadrangle 
formed by the union AU Ai (Figure 2)l. 





Fig. 4. The cocircularity event and edge-flip 
operation. 



We shall see that the above al- 
gorithm maintains a set of junction 
triangles that satisfy local Delaunay 
property. Further, note that only the 
collinearity events may change the 
number of junction triangles — we 
decrease or increase the number of 
junction triangles by one, depend- 
ing on whether a convex hull edge 
appears or disappears. The other types of events do not affect the number of 
junction triangles. Therefore, we also maintain the right number of junction tri- 
angles. By Lemma 0 and the local property, we can conclude that the above 
algorithm correctly maintains T>(P), and thus the compact Voronoi diagram. 

We have shown above how the compact Voronoi diagram can be maintained. 
Since the number of certificates is the sum of the number of junction triangles and 
the non-polygonal convex hull edges, the above structure has 0{k) certificates. 
The processing time for each event is O(logfc), dominated by the processing 
cost of the event queue. However, the structure is not local as one polygon may 
be involved in up to 0{k) certificates, although on the average each polygon is 
involved in 0(1) certificates. As for the events processed, the above algorithm is 
output-sensitive in the sense that every event changes either the features that 
define a junction Voronoi vertex or the topology of the compact Voronoi diagram. 
Thus, we have. 



Theorem 1. The compact Voronoi diagram ofV can be maintained by a kinetic 
data structure in an output-sensitive manner. In the structure, the number of 
certificates is 0(k), and each event can be processed in 0(logk) time. 

According to !B(;H97j . the structure we described is compact, responsive, 
and efficient. However, it is not local as one polygon may be involved up to 0(k) 
certificates. Although the algorithm maintains the compact Voronoi diagram in 
an output-sensitive manner, we have not answered the question on how many 
events the data structure may need to process for algebraic polygon motions. 
To complete our analysis, in the next section, we will analyze the number of the 
combinatorial changes of the compact Voronoi diagram. 
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4 Combinatorial Changes of the Compact Voronoi 
Diagram 

In this section, we will study the number of combinatorial changes of the compact 
Voronoi diagram. For the analysis purpose, we assume that the polygons move 
rigidly in pseudo-algebraic motion with constant degree. That is, each certificate 
involving the same set of features can fail only constant number of times during 
the entire motion process. This assumption is very general as it includes the 
motions that can be represented in algebraic or rational functions with constant 
degree. 

Observe that an event happens when there are four features cocircular or 
three collinear. This gives us an immediate upper bound of O(n^) for con- 
stant degree pseudo-algebraic motion. However, we will show a significantly 
smaller upper bound of 0{kn^ j3{k) [5{n)) . Here, A(n) is the maximum length 
of a (n, s) Davenport-Schinzel sequence for some constant s, and j3{n) = \{n)/n 
is an extremely slowly growing function and can be regarded as close to a con- 
stant for all reasonable values of n. On the other hand, there are examples to 
show an I7(n^) lower bound. The lower bound can be realized by three convex 
polygons, each with n/3 edges. In FigureEl (a), when the polygon P moves hori- 
zontally, the triplets of features that define the junction Voronoi vertices change 
I7(n^) times. 




(a) (b) 



Fig. 5. Lower bound constructions: (a) shows an example with i7(n^) changes to the 
compact Voronoi diagram, (b) shows an example with Sl{k^) changes to a pair of 
objects. 



Now, our main task is to prove the following theorem. 

Theorem 2. Suppose that V is a set of k disjoint convex polygons with n ver- 
tices in total. When all the polygons in V move algebraically and without col- 
liding with each other, the compact Voronoi diagram changes 0{nX(n)X{k)) = 
0{kn^ j3{n)l3{k)) times. 

In [HKK92j . a similar upper bound is proved for the number of changes of 
the Voronoi diagram of k sets of points, each moving independently and rigidly. 
We will use similar techniques to prove our upper bound. However, the presence 
of edges adds complexity, and additional insights are required to complete the 
proof. 

We first give an example to show the difference from the points case. In the 
proof of an 0{k^P{k)) bound on the number of changes of the Voronoi diagram 
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of k moving points, the key step is to bound the number of changes involving two 
points by 0{kf}{k))\ this implies that the total number of changes is bounded 
by 0 {k^ j 3 {k)) . This fact, however, is no longer true for polygons. Consider the 
example shown in Figure 0(b). where P\ and P2 are two long, parallel line 
segments and and S2 are two group of points. Let £ be the bisector line of 
Pi and P2- The set consists of k points lying on £ so that their Voronoi 
regions relative to Pi and P2 are mutually disjoint. The other group of points S2 
lie slightly above £ and are spaced compactly so that they, together with their 
Voronoi regions, can be accommodated between any two points of ^i. Now, 
imagine the second group of points moving from x = —00 to x = +00 on the 
line £; there will be 0 (fc^) changes of the Voronoi edges incident to Pi and P2- 
As this example shows, we can no longer bound the number of changes related 
to a pair of polygons by roughly 0(fc) as was done in the point case. However, as 
we shall see later, still there is a way to charge each change to a carefully chosen 
polygon pair so that no pair receives too many charges. 

To proceed, we first consider how Voronoi vertices move when polygons move. 
Consider three convex polygons, i.e., V = {Pi,P2,P3}. We write that m = \Pi\, 
where 1 < i < 3 , and n = rii +ri2 +713. For three objects, a combinatorial change 
can happen to 'D{V) only when the triplets of features that define the Voronoi 
vertices change. For the number of combinatorial changes for constant degree 
algebraic motion, we have the following bound. 

Lemma 5. The compact Voronoi diagram of three convex polygons changes 
0{{nin2 + niU3 + n2n3)P{n)) times. 

Proof. There are two types of events causing a combinatorial change. 

1 . The first type (type I) occurs when there is a common outer tangent line to 

Pi, P2 and P3. Such events cause a Voronoi vertex to appear or disappear. 

2 . The second type (type II) occurs when there is a circle touching four features, 

i.e., when four features become cocircular, and this circle is free of the interior 
of the polygons. Such events change the triplet of features that define a 
particular Voronoi vertex. 

It is easy to see that there can be at most 0(mn2 + nin^ + 712713) type I events. 
For the type II events, among the four features that are cocircular, there must 
be two, say fi and /2, from the same polygon. By the convexity, they are an 
edge and an endpoint of the edge. Therefore, when this event happens, the 
circle must be tangent to an edge at its endpoint. Suppose that e is an edge 
of Pi and p is an endpoint of e. We parameterize all the circles tangent to e 
at p by their radii (we only consider those circles whose centers are outside of 
the polygon), and denote this set by O. Then, for any feature /, we define a 
function Sf{t) to be the radius of the circle in O which touches / at time t. 
When / moves algebraically, 6 f{t) is a rational function. Consider the family 
of function IB{P2) = \ f is a feature of P2}- Clearly, the lower envelope of 

the arrangement of S"(P2) corresponds to the circles that touch P2. Similarly, 
we can define lE^P^). Then, a type II event that involves e and p corresponds to 
an intersection point between the lower envelopes of 1^{P2) and S^P^), whose 



350 



L.J. Guibas, J. Snoeyink, and L. Zhang 



complexity is bounded by A(n2) + A(n3). Therefore, the number of type II events 
that involve two features of P\ is bounded by 0(ni(A(n2) + A(ri3))) = 0((niri2 + 
nin^)l 3 {n)). Repeating the above argument for P2 and P3 proves the lemma. 

Because of the way that we parameterize the bisector, we also need the 
following fact. 

Lemma 6. The distance function S{Pi,P2) consists of 0{nin2) rational arcs 
with constant degree when P\,P2 moves algebraically with constant degree. 

For three polygons Pi, P2 and P3, recall that S'12,3 is the set of points that 
are on 7 Ti 2 and shaded by P3. The parameters of this shaded set, 5'i2,3, may 
be of the forms 0, (00, a], [6, +00), (—00,0] U [6, +00), [a,b], or (—00, +00). We 
define the function <^i2,3(t) as follows. If at time t, 7Ti 2 is half-shaded by P3 at a, 
then 4>i2,3{t) is defined to be a. Otherwise, it is undefined. Since an endpoint of 
S'12,3 corresponds to the parameter value of a Voronoi vertex when considering 
Pi, P2 and P3 only, LemmaOandElsay that 4>i2,3 consists of 0((nin2 -I- niris -I- 
n2U.3)/3(n)) pieces of rational arcs. Likely, we define the function for each 
triplet i,j,l. 

For a pair of polygons Pi and Pj, we have a family of functions <l>ij = 
Wij,i M 7^ hj}- Let P{<P) denote the upper envelope of a set of functions 
We first show that 

Lemma 7. Each cocircular event can be charged to a break point on r(<l>ij) or 
the overlay between P{d>ij) and —P{<Pji), for some i ^ j. 

For the moment, let us assume that the above lemma is true and prove 
Theorem El 

Proof. (Theorem As we have already discussed, there are two types of 
events: type I, when the feature triplets defining Voronoi vertices change, and 
type II, when four polygons become cocircular. By LemmaO the number of type 
I of events is bounded by: 

+ rijni)P{n) = 0 {kn^! 3 {n)) . 

For type II events, by LemmaQ they can be charged to break points on the 
lower or upper envelopes of 'Tij's or their overlay. Since each (fij^i consists of 
0 {{ninj + UiUi + njni)j 3 {n)) pieces of rational arcs. The complexity of P{<Pij) is 
then bounded by: 

PW + nj)ni)l 3 (n) = 0 {{kniUj + (n* -|- nj)n)j 3 {n)l 3 {k)) . 

The overlay between two envelopes has the same order of complexity. Thus, 
the number of cocircular events is bounded by 

Y.i,j{ 0 {{kninj + (n» -k nj)n)[i{n)f 3 {k))) = 0 {kn^ j 3 {n)fi{k)) . 

Now, the only piece left is the proof of Lemma 0 

Proof. (Lemma 0 ). Suppose that at time t, a cocircularity event happens to 
-Pi, P2, P3 and P4. We claim that among those four polygons, there always exist 
two, say Pi and P2, so that S'12,3 and S'12,4 are not closed intervals with the form 
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[a, b]. Now pick any two polygons, say Pi and P 2 - If neither of the shaded sets 
‘S' 12,3 nor S' 12,4 is a bounded arc, then the pair of polygons P\,P 2 are what we 
want. Otherwise, suppose 5 'i 2,3 is bounded. By Lemma^ P 3 is completely inside 
the corridor between Pi and P 2 . Thus P 2 is not inside the corridor between Pi 
and P3. If P4 is not inside the corridor between Pi and P3, then the pair Pi,P3 
satisfy the requirement. Otherwise, P4 is inside the corridor between Pi and P3. 
In this case, P3,Pi are the desired pair. Let us rename the pair with the above 
property Pi , P 2 . 

Suppose V is the coincident Voronoi vertex at time t. Let x = Ci 2 (i')- By 
Lemma El at time t, there cannot be any other P^ (z ^ 1 , 2 , 3,4) so that x in 
the interior of Si 2 ,i- Since 812^3 are not closed intervals, either ^i 2 , 3 (t) = x or 
021, 3 (i) = —X. The same argument applies to 0i2,4(i) and 02i,4(^)- The fact that 
V is not shaded by any other Pi allows us to charge such an event either to a break 
point on P{<l>i 2 ) or P(^ 2 i) or to an intersection between P('?i 2 ) and — P(<? 2 i)- 
Thus we have proved Lemma [7] and completed the proof of Theorem E] 



5 Applications 

In this section, we briefly discuss some applications of the above data structure. 
In the above presentation, one important issue left unspecified in our method is 
the way to handle the corridors. We will present different structures dependent 
on the application requirements. 



5.1 Collision Detection 

A major motivation to maintaining a decomposition of free space is to detect 
collision for moving objects |E(;SZ99IBEC+99| . If for each corridor, we add an 
inner bi-tangent line between the two convex chains bounding the corridor, we 
can detect collision between the objects involved (refer to |EG^Z99! for why 
we prefer tangent based separation to the separation based on the closet pair). 
In |EGSZ99j , efficient hierarchical methods are developed to reduce the number 
of events associated with tracking tangents. Those methods can be used in our 
setting as well. 



5.2 Retraction Motion Planning 

The compact Voronoi diagram can be used to do retraction motion planning 
(so that the robot finds a path that stays maximally far from the obstacles). 
In retraction motion planning, we need to know the narrowest passage between 
two convex polygons. For this purpose, we may maintain the closest pair of 
features between two convex chains in a corridor. In [ILC91iVTir97) . there are 
local conditions given to check if a pair of features is the closest pair. It is not 
hard to see that such a condition can be used to certify and then maintain the 
nearest pair. 
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6 Conclusion 

We have shown how to maintain a partition of the free space outside k mov- 
ing convex polygons in the plane into triangles and corridors. In each cell of this 
partition the closest obstacle is one of the two or three polygons defining the cor- 
ridor or triangle respectively. Our structure continuously maintains 0{k) polygon 
pairs among which must be the closest pair of polygons. With the addition of 
a simple corridor collision test, as outlined above, the kinetic compact Voronoi 
diagram subsumes both the broad and narrow phases as commonly defined in 
the collision detection literature. Unlike more classical methods, our structure 
can easily accommodate deforming obstacles, as long as they stay convex. An 
extension of our structure to 3D would be interesting. 
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Abstract. Planar point location is among the most fundamental search 
problems in computational geometry. Although this problem has been 
heavily studied from the perspective of worst-case query time, there has 
been surprisingly little theoretical work on expected-case query time. 
We are given an n-vertex planar polygonal subdivision S satisfying some 
weak assumptions (satished, for example, by all convex subdivisions). 
We are to preprocess this into a data structure so that queries can be 
answered efficiently. We assume that the two coordinates of each query 
point are generated independently by a probability distribution also sat- 
isfying some weak assumptions (satisfied, for example, by the uniform 
distribution). 

In the decision tree model of computation, it is well-known from informa- 
tion theory that a lower bound on the expected number of comparisons 
is entropy{S). We provide two data structures, one of size O(n^) that 
can answer queries in 2 entropy{S) + 0(1) expected number of com- 
parisons, and another of size 0(n) that can answer queries in (4 -|- 
0(l/\/log n)) entropy{S) + 0{l) expected number of comparisons. These 
structures can be built in O(n^) and 0(n log n) time respectively. Our 
results are based on a recent result due to Arya and Fu, which bounds 
the entropy of overlaid subdivisions. 



1 Introduction 

Planar point location is certainly among the most fundamental search problems 
in computational geometry. Given a polygonal subdivision S in the plane, the 
problem is to construct a data structure so that given any query point q in the 
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plane, it is possible to determine efficiently which polygon of the subdivision con- 
tains q. This problem has been heavily studied in computational geometry. (For 
example, a search for “point location” found 77 papers in the computational ge- 
ometry bibliography.) With only a few exceptions, previous work on this problem 
has dealt with the worst-case complexity of this problem. When expected-case 
complexity has been considered, it has been done under the assumption that 
both the subdivision and the query points are selected subject to various as- 
sumptions on distribution. Here, we consider search algorithms that are efficient 
in the expected-case for queries, and in the worst-case for subdivisions. 

The planar point location problem is a generalization of the well-known one- 
dimensional search problem. In the one-dimensional case, we are given a set of 
n keys, and told the probabilities of accessing each key and the n + 1 failure 
probabilities of falling in the gaps between the keys. If we assume that the 
probability of matching a key is zero, then this reduces to the expected-case 
complexity of solving a point location problem for n -I- 1 disjoint subintervals of 
the unit interval. Consider any binary search tree whose leaves correspond to 
the intervals. It is easy to see that the expected number of comparisons is given 
by the weighted external path length M of the tree, where the weight of a leaf 
is the probability of the query point lying in the associated interval. 

Let Pi denote the probability of falling in the ith interval. A fundamental 
information theoretic result due to Shannon implies that the weighted path 
length of any binary tree (and hence the expected number of comparisons) is at 
least the entropy of the probability distribution 



(Unless otherwise stated, all logarithms are base 2.) Knuth ^31 shows how to 
construct an optimum binary search tree in O(n^) time using dynamic program- 
ming. Hu and Tucker CH presented a bottom-up construction of the tree, which 
takes 0(n log n) time, but is quite complex. Mehlhorn gives a simple con- 
struction of a binary search tree whose weighted path length is within a constant 
additive factor of the entropy-based lower bound. It is eminently natural to ask 
whether these results can be extended to planar subdivisions. To the best of 
our knowledge, this is the first paper to address this obvious and fundamental 
problem. 

Consider a polygonal subdivision S. Given a region z in S, let p^ denote the 
probability that the query point lies inside region z. Define the entropy of S to 



The coordinates of the query points are assumed to be sampled independently 
from probability distributions over bounded intervals of the a:-axes and y-axes. 
Both S and the probability distributions are assumed to satisfy some additional 
weak assumptions (see Section El for formal definitions) . 




be 



entropy (S) = Pz log 
z^S 
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Shannon’s lower bound applies to the planar point location problem as well. 
We present two algorithms for the planar point location problem. The first uses 
quadratic space and can answer point location queries in 2 entropy(S) + 0(1) 
expected number of point-line comparisons (i.e., given a point and a directed 
line, one has to determine whether the point lies to the left of, on, or to the right 
of the line). The second uses 0{n) space, and can answer point location queries 
in nearly 4 entropy(S) + 0(1) expected number of point-line comparisons. 

The paper is organized as follows. In Section we present definitions and 
state our results formally. In Section 0 we present background on the planar 
point location problem. In Section ^ we present our algorithms for the case of 
uniformly distributed query points, and in Section Owe generalize our results to 
a wider class of probability distributions. 

2 Definitions and Main Results 

Let / and J be two arbitrary intervals of real numbers. In this paper, we only 
work with planar subdivisions that partition an underlying rectangle / x J into 
disjoint connected regions. We allow the underlying rectangle to be the infinite 
plane, in which case I = J = (— oo, oo). 

Given a query point q, let Xq and yq denote its x and y coordinate. Through- 
out this paper, we assume that Xq and yq are two independent random variables. 
We denote the probability distribution function for Xg by P : I — [0, 1] and the 
probability distribution function for yq hy Q : J ^ [Ojl]- That is, P{x) is the 
probability that the random variable Xq is less than or equal to x, and Q{y) is the 
probability that the random variable Pq is less than or equal to y. We call (P, Q) 
a well-behaved distribution if P and Q are continuous and strictly increasing. 
For example, if / x J is the unit square, then picking Xq and pq uniformly and 
independently from [0, 1] yields a well-behaved distribution. 

Let U be the unit square [0, 1]^. We define a mapping fpg from I x J (call 
this geometric space) to U (call this probability space) as follows: 

fpQ{x,y) = {P{x),Q{y)). 

If (P, Q) is well-behaved, then Jpq is a bijection as P and Q are strictly increas- 
ing. We can also generalize fpQ in the obvious way for a set of points. Let A be 
any set points in / x J. Then 

fpQ{A) = {{P{x),Q{y)) : {x,y) £ A}. 

In this paper, we assume that each evaluation of P, Q, P~^, and Q~^ takes 
constant time. 

We are now ready to state the main results of this paper. 

Theorem 1. Let I and J be two intervals of real numbers. Let S be a planar 
subdivision of L x J ofn vertices. Suppose that a well-behaved distribution {P,Q) 
is given for the coordinates of the query point, and for each region z € S, fpQ{z) 
has at most a constant number of holes and the perimeter of fpq{z) is bounded 
by a constant. Then 
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(i) Using 0{n^) space and O(n^) preprocessing time, it is possible to answer 
point location queries in 2 entropy{S) + 0(1) expected number of point-line 
comparisons. 

(ii) Using 0{n) space and 0(n log n) preprocessing time, it is possible to answer 
point location queries in (4+0(l/-yiog n)) entropy {S)-\-0{l) expected number 
of point-line comparisons. 

Clearly this theorem applies if J x J is the unit square U and Xq and yq 
are chosen uniformly and independently from [0, 1]. Thus we have the following 
theorem, which is an interesting special case of Theorem [0 

Theorem 2. Let S be a planar subdivision of U of n vertices. Suppose that 
the coordinates of the query point are chosen uniformly and independently from 
[0, 1], and for each region z in S, z has at most a constant number of holes, and 
the perimeter of z is bounded by a constant. Then 

(i) Using 0{n^) space and 0(n‘^) preprocessing time, it is possible to answer 
point location queries in 2 entropy{S) + 0(1) expected number of point-line 
comparisons. 

(ii) Using 0(n) space and 0(n log n) preprocessing time, it is possible to answer 
point loeation queries in (4+0(l/-v/log n)) entropy {S)-\-0{l) expected number 
of point-line eomparisons. 

Remark: It is worth noting that any convex polygon in the geometric space is 
mapped by fpQ to a region in the probability space that has bounded perimeter. 
This follows from the fact that any monotonic increasing (resp. decreasing) curve 
in the geometric space maps to a monotonic increasing (resp. decreasing) curve 
in the probability space. And the length of any monotonic curve in the unit 
square is bounded by 2. Thus Theorem E applies to any subdivision of the plane 
into (bounded and unbounded) convex polygons. 



3 Background 

For the planar point location problem, let n denote the number of vertices in 
the subdivision. The early work of Dobkin and Lipton jS] showed that a query 
time of O(logn) and space 0{n^) could be achieved. Lipton and Tarjan |TH) 
showed that the space requirement could be reduced to 0{n), but their approach 
was rather impractical. Since then a number of more practical methods have 
been proposed. These include Kirkpatrick’s clever hierarchical method [Ej, the 
separator method by Edelsbrunner et al. 0, the persistent search tree method by 
Sarnak and Tarjan m, and the randomized incremental method by Mulmuley 
Pd. All of these are based on worst-case analyses. Recently Adamy and Seidel P 
presented an 0(n) space data structure that achieves a worst-case query time 
of log n -\- 2\/log n -\- 0(log^^^ n) point-line comparisons, thus approaching the 
worst-case information theoretic lower bound. 
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Existing work on expected case-performance has been based on the assump- 
tion that both the subdivision and the queries satisfy certain probabilistic as- 
sumptions. Edahiro et al. jj] proposed a practical algorithm for planar point 
location based on bucketing techniques. Their method may use 0(n^) space in 
the worst case. Methods using kd-trees, quad-trees, and R-trees are also popular 
in practice, but their analyses do not hold in the worst case. Mucke et al. m and 
Devroye et al. Q have analyzed methods based on walking through subdivisions. 
For Delaunay triangulations of uniformly distributed data sets, these methods 
take expected time close to and 0{n^^^) in two and three dimensions, 

respectively. 

Goodrich et al. m presented an interesting point location method, which 
adapts to the query distribution. Intuitively, if a cell is accessed more frequently, 
then the data structure is modified to ensure that the time for subsequent 
accesses to the cell is reduced. They show that the amortized time complex- 
ity for accessing cell i in a sequence of m queries is 0(min{log n, log(t(i) -I- 
1), log(m//(i))}), where t{i) is the number of different queries between two ac- 
cesses to cell i, and f{i) is the frequency of accesses to cell i. A limitation of their 
approach is that the cells are not the regions in the given subdivision; instead 
they are the trapezoids in the refined subdivision formed by passing a vertical 
line through each segment endpoint. This can adversely affect the query time. 



4 The Uniform Distribution Case 

We first present our techniques in attacking the case when I x J is U and the 
coordinates of the query point are chosen uniformly and independently from 
[0,1]. The techniques are based on certain box decompositions of the planar 
subdivision S of U. In the case of general well-behaved distributions (P,Q), 
the key insight is that the map fpQ transforms the problem in the geometric 
space to a problem in the probability space, where we are to locate query points 
in the subdivision /pg(S') of U, and the coordinates of the query point are 
chosen uniformly and independently from [0,1]. Thus, for general well-behaved 
distributions, it suffices to invoke the techniques for uniform distribution in the 
probability space to organize a point location data structure. Given a query 
point q, we use this data structure to locate the region z' in /pg(5) containing 
fpQ{q), and the region in S containing z is then given by /pgiz'). These claims 
will be proved formally in Section 0 

In the following, we focus on the uniform distribution case. We first present 
an algorithm, which uses 2 entropy (S)+0{1) expected number of point-line com- 
parisons. The data structure needs 0{n^) space and can be built in 0{n^) time. 
Later we present another algorithm which reduces the space to 0{n) and the pre- 
processing time to O(nlogn). The expected number of point-line comparisons 
goes up to nearly 4 entropy{S) + 0(1). 

A lemma proved in |2| will be very useful. We state it in a form which is 
applicable in two dimensions. The result concerns with overlaying two planar 
subdivisions of U . One subdivision is the given planar subdivision S of U . The 
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other subdivision is a decomposition of U into cells that enjoys the following 
properties, for some constants and c„: 

(A.l) Difference of Two Rectangles: A cell is the set-theoretic difference of two 
axis-parallel rectangles, one enclosed within the other. We call these the 
outer rectangle and inner reetangle of the cell. Note that the inner rectangle 
need not be present. Given a cell it, we let uq and uj denote its outer and 
inner rectangle, respectively. Also, we define the size of it, denoted by s^, to 
be the length of the longest side of uq- 

(A. 2) Bounded Aspect Ratio: The outer rectangle and inner rectangle (if present) 
have aspect ratio (ratio of longest to shortest side) bounded by Ca- (In this 
case we say that the cell has aspect ratio at most Cq.) 

(A. 3) Stickiness: If the cell has an inner rectangle, then for each dimension, the 
separation between the corresponding faces of the inner and outer rectangle 
is either 0 or at least the length of the inner rectangle along that dimension. 

(A. 4) Proximity to S: For each cell u, there is some edge or vertex in S within a 
distance of c„ • from any point in uq- 

(A. 5) Disjointness: Given any two cells, either the outer rectangles of the two cells 
are disjoint or the outer rectangle of one cell is contained within the inner 
rectangle of the other. 

We define a fragment to be a connected component in the intersection be- 
tween a cell in the decomposition and a region in S. Let T be the set of all 
fragments. Let area{x) denote the area of region x. 



Lemma 1. Let S be a planar subdivision of U sueh that each region has at 
most a eonstant number of holes and the total boundary length of each region is 
bounded by a eonstant Cs- Let D be a decomposition ofU that satisfies properties 
A.l, A.2, A. 3, A.j, and A. 5. Let T be the set of fragments in the overlay of S 
and D. Then 



area(x) log 



1 

area{x) 



< 



2 ^ area{z) log 
z^S 



1 

area(z) 



+ 0 ( 1 ), 



where the constant in the 0-notation depends on Ca, Cg, and c„. 



4.1 Quadratic Space Solution 

We prove Theorem Q^i) in this subsection. Let S be the given planar subdivision 
of U such that each region has at most a constant number of holes, and the 
total boundary length of each region is bounded by a constant. We construct a 
hierarchical decomposition of U by building a box-decomposition tree (BD-tree) 
on the vertices of 5 ^ E2j ■ Initially, the BD-tree contains only one node which 
is the root. Each node represents a cell and the root represents U. We keep 
expanding the tree until some terminating condition is satisfied. The leaf cells 
form the desired decomposition of U. We describe how to construct children for 
a node u below. For convenience, we also use u to denote the cell it represents. 
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If u contains at most one vertex, then u is a leaf cell. Otherwise, it can be 
guaranteed inductively that u is a rectangle and we recursively construct two 
children of u as follows. Split u orthogonally at the midpoint of its longest side 
to obtain two rectangles v and w. If both v and w contain some vertex, then we 
make v and w children of u. This operation is called a midpoint split. Otherwise, 
if u or ic is empty, then we recursively apply the midpoint splitting rule to the 
non-empty rectangle, until we obtain a rectangle v' such that v' will be split into 
two non-empty rectangles. We make v' and u\v' children of u. This operation 
is called a shrink. Note that u \ u' is a leaf cell and it contains no vertex. 

To construct the tree efficiently, we use a standard trick due to Vaidya m 
for partitioning the points. We store the data points contained in a cell in d sep- 
arate lists, each sorted by one of the coordinates, that are cross-referenced with 
each other. Instead of updating the lists after each split, we update them after a 
sequence of splits is performed, until each of the resulting subsets contains fewer 
than half the initial number of points. Also, assuming a model of of computation 
in which exclusive-or, integer floor, powers of 2, and integer logarithm can be 
computed on point coordinates, the shrink operation can be performed in 0{d) 
time. (For example, see Bern |3|). Straightforward modification of the argument 
given by Vaidya leads to a construction time of O(nlogn). (We mention that 
we can achieve the same construction time without using non-algebraic opera- 
tions by building the sliding-midpoint tree dsmsi instead. It can be shown that 
Lemma Q] holds for the fragments induced by the leaves of the sliding- midpoint 
tree. The query algorithm and the rest of the analysis given in this section can 
also be easily adapted.) 

The cells associated with the leaves of the BD-tree satisfy properties A.l, A. 2 
(ca = 2), A. 3, A. 4 {cn = 2), and A. 5. In addition, the BD-tree has the following 
property, which is important for our analysis. 

Lemma 2. Let T be a BD-tree constructed on some point set in U. For any 
query point q, the number of point-line comparisons needed in traversing T to 
locate the leaf cell y containing q is at most log -I- 0(1). 

Proof. Suppose that we arrive at a node representing a cell u in traversing T 
and we need to decide which of its child cells should be visited. Let v and w 
be its child cells. If v and w are formed by a midpoint split, then one point- 
line comparison is needed to determine whether v or w contains q. Note that 
the area of v and w are both half the area of u. If v and w are formed by a 
shrink, then one is a leaf cell, say v, and it encloses the other child cell, say w. 
Let z, 1 < * < 4, denote the number of sides that the inner box of v does not 
share with the boundary of u. Thus, it takes i point-line comparisons to decide 
whether v or w contains the query point q. Note that the area of w is at most 
1/2® times the area of it, for 2 < i < 4. Therefore, in both cases of midpoint 
split or shrink, if we spend i point-line comparisons to decide the next child cell 
to visit and this child cell is not a leaf cell, then the area of this child cell is 
at most 1/2® times the area of its parent. The area of U is 1. Therefore, the 
number of point-line comparisons needed to reach the leaf cell y containing q is 
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at most \og{l / area{y)) + 0(1), where the 0(1) additive term comes from the 
last i point-line comparisons, 1 < J < 4, spent at the parent of y. 

Let y denote any leaf cell of the BD-tree. Observe that y is either a rectangle 
containing at most one vertex of 5, or it is the set-theoretic difference of an 
outer and inner rectangle, in which case it contains no vertex of S. In each case 
we partition y into at most four rectangles whose interior contains no vertex of 
S (we call them subcells). If j/ is a rectangle and contains no vertex of S in its 
interior, then the subcell is y itself. Otherwise if it contains a vertex of S in its 
interior, then we split it into two subcells by a vertical line passing through this 
vertex. Otherwise it must be the set-theoretic difference of an outer and inner 
rectangle. In this case we partition it into at most four subcells by passing lines 
coinciding with the vertical sides of the inner rectangle. 

Define a pseudo-fragment to be a connected component in the intersection of 
any subcell with a region in S. Clearly each fragment is partitioned into at most 
four pseudo-fragments. Let z be any subcell. Observe that z contains no vertex 
of S and intersects 0{n) edges of the subdivision S. Thus z is partitioned into 
at most 0{n) pseudo- fragments. Since the subdivision inside z is so simple, we 
can locate the pseudo-fragment in z containing the query point by searching an 
auxiliary structure associated with z. 

If there is an edge that intersects two opposite sides of z, then let s be one 
of the sides intersected. The edges intersecting s divide z into super-fragments 
which can be linearly ordered along s. (See Figure D) Each super-fragment is 
either a pseudo-fragment by itself, or it is further subdivided by other edges 
into pseudo-fragments which can be linearly ordered within the super-fragment. 
(There are at most two super-fragments which are further subdivided; these are 
shown shaded in the figure.) Thus, we first organize a weighted search tree m 
for the super-fragments with their area as weights. Each super-fragment points 
to another weighted search tree storing the linearly ordered pseudo-fragments 
within the super-fragment (the area of the pseudo-fragments are the weights in 
this second level tree). If there is no edge that intersects two opposite sides of z, 
we can do the above using any side s of z. 




Fig. 1. Super-fragments inside a subcell. 



A single query is now answered by first locating the leaf cell in the BD- 
tree that contains the query point q. Then we determine which of the at most 
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four subcells associated with this leaf cell contains q. Then we query the auxil- 
iary structure for the subcell to locate the pseudo-fragment containing q. Each 
pseudo-fragment lies inside a region in S and hence we have the solution to the 
query. 

We analyze the time to answer a single query as follows. By Lemma 13 the 
number of point-line comparisons needed to reach a leaf cell y of the BD-tree is 
log(l/orea(y)) -I- 0(1). It takes 0(1) point-line comparisons to find the subcell 
z containing q. Then we query the auxiliary structure associated with z. It is 
known HH that querying a weighted search tree takes at most \og{K/k) + 2 
comparisons, where K is the total weight of all the items, and k is the weight 
of the item being searched for. Therefore, querying the auxiliary structure takes 
\og{area{z) / area{z')) +\og{area{z') / area{x)) -1-4 point-line comparisons, where 
z' and X are the super-fragment and pseudo-fragment containing the query 
point, respectively. Hence, the total number of point-line comparisons is at 
most \og{l / area{y)) + \og{area{z) / area{z')) + \og{area{z') / area{x)) -I- 0(1) < 
log(l/orea(a;)) -I- 0(1). 

The probability of the query point lying in a pseudo-fragment x is clearly 
area{x). Thus, the expected number of point-line comparisons to answer a query 
is at most 



area(x) ( log ^ -I- 0(1) ) = entropy{T') + 0(1), 

\ area[x) J 

where T' is the set of pseudo-fragments. Since each fragment is partitioned into 
at most four pseudo-fragments, it is easy to see that entropy{T') = entropy{T) + 
0(1). Therefore, the expected number of point-line comparisons is at most 
entropy{T) + 0(1), which is at most 2 entropy{S) -I- 0(1) by LemmaOl 

We analyze the space of the entire data structure. The space needed by 
the BD-tree is 0(n). Since there are 0(1) subcells for each leaf cell, and 0(n) 
pseudo-fragments for each subcell, the auxiliary structure at each leaf cell also 
takes 0(n) space. Thus, the total space is O(n^). As mentioned earlier the BD- 
tree can be contructed in O(nlogn) time. A weighted search tree of m sorted 
items can be constructed in 0(m) time jl Yj . Thus, the auxiliary structure at each 
leaf cell can be constructed in 0(n) time which leads to a total preprocessing 
time of O(n^). This completes the proof of Theorem|3(i). 

4.2 Linear Space Solution 

We prove Theorem|3(ii) in this subsection. First, we also build a decomposition 
tree on the vertices of S, but it is different from the BD-tree in the quadratic 
space solution. The cell at each node of the tree will be rectangles of bounded 
aspect ratio. We will classify the leaf cells of the tree into two types, S-type and 
L-type. The root of the tree represents the unit square U . Inductively suppose 
that we are to construct the children of a cell u. 
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1. If area{u) < 1/n, we label u an S'-type leaf cell. 

2. If area(u) > 1/n and u intersects no edge of S, then u must be completely 
contained in some region of 5; we store the name of this region with it. In 
addition, we label u an L-type leaf. 

3. If area(u) > 1/n, u intersects some edge of S, and each region in it n 5 has 
area less than 1/n, then we label it an 5-type leaf cell 

4. Otherwise, area{u) > 1/n, u intersects some edge of S, and some region in 
uC\S has area at least 1/n. We split u using a midpoint split into two cells 
V and w, and make them children of it. Then we recursively construct the 
descendants of v and w. 

We denote this decomposition tree by T(5). The cells associated with the leaves 
of the tree satisfy properties A.l, A. 2 (ca = 2), A. 3, A. 4 (c„ = 2), and A. 5. We 
also have the following result which is analogous to Lemma El 

Lemma 3. For any query point q, the number of point-line comparisons needed 
in traversing T{S) to locate the leaf cell y containing q is at most log 
0 ( 1 ). 

The final step of preprocessing is to construct the worst-case planar point 
location data structure for S invented by Adamy and Seidel fp. This data struc- 
ture uses 0(n) space and can be constructed in 0(n log n) time. A point location 
query can be answered using log n-\-2yflog n) point-line comparisons. 

Given a query point q, we first descend T{S) to find the leaf cell x containing 
q. If X is an L-type leaf cell, then we report the region of S containing x and 
terminate. Otherwise, x is an 5-type leaf cell, and we simply resort to Adamy 
and Seidel’s data structure to answer the point location query. 



Space analysis. Each leaf cell of T(5) has area at least l/2n and they partition 
the unit square U. This implies that T(5) has 0{n) leaves and hence 0{n) nodes. 
Adamy and Seidel’s data structure use 0{n) space. Thus, the total space needed 
is 0{n). 



Query time analysis. Recall that a fragment is a connected component of 
the intersection of the leaf cells of T(5) and S. Let J- denote the set of all 
fragments. By Lemma 01 , we have entropy(J-) < 2 entropy (S) 0(1). In the 

following, we show that the expected number of point-line comparisons to an- 
swer a query is (2 -|- 0{l/y/\og n)) entropy{T) and so the desired bound of 
(4 -I- 0(l/-^log n)) entropy (S) 0(1) follows. 

We call a fragment large if its area is at least 1/n, and small otherwise. Let 
iFi and F 2 denote the set of large and small fragments, respectively. A large 
fragments is exactly an L-type leaf cell and vice versa. Small fragments lie inside 
5-type leaves. 

We analyze the time to locate a query point q. Suppose that q lies in- 
side a fragment x £ T . 11 x \s large, then x is an L-type leaf, and the num- 
ber of comparisons needed to reach x is log(l/area(x)) by Lemma El If x is 
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small, then the query procedure will first locate the leaf cell y containing x and 
then query the worst-case data structure. By Lemma 0 the number of point- 
line comparisons needed to reach y is log(l/orea(y)). Since area{y) > area{x), 
log{l / area (y)) < log{l / area (x)) . Adding the number of point-line comparisons 
needed for querying the worst-case data structure, the total number of compar- 
isons is at most log( 1/ area (x)) -I- log n -|- 2^/log n -|- 0{\og}^'^ n). Since x is small, 
area{x) < 1/n which implies that log (1/ area (a;)) > logn. So the total number 
of comparisons is at most (2 -|- 0{l/yJ\og n)) \og{l / area{x)) . 

The probability that q lies in a fragment x is clearly area{x). Thus, the 
expected number of point-line comparisons to answer a query is bounded by 



area{x) log 
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Preprocessing time. When we construct the child cells of a cell u during 
preprocessing, the most time consuming part of the construction is to determine 
whether each region in u fl 5 has area less than 1/n. We describe a method to 
carry out this computation efficiently. 

Define to be the set of regions of area at least 1 /n in an5. The observation 
is that any region in at a child v of u must be contained in some region in 
Lu- Therefore, our strategy is to compute for each node u inductively. 

Let r be the root of 7~(S) and so r fl 5 = 5. We simply traverse S in 0(n) 
time to collect all regions of area at least 1/n in L^. Inductively, let v be the 
child of u and we are to compute Ly. For each region z in we claim that we 
can compute the intersection z n u in time proportional to the size of z. (Note 
that z flv may consist of several connected components.) 

This can be done by clipping z with four halfplanes successively. We de- 
scribe the first clipping as follows. Let £ be the bounding line of a halfplane. For 
convenience, denote the size of z by |z|. First, compute the 0(|z|) intersections 
between £ and the boundary of z in 0(|2;|) time by brute-force. Second, apply 
Jordan sorting to sort these intersections in order of their appearance on £. This 
can also be done in 0(|z|) time 0. Third, start a clockwise traversal from some 
vertex of z within the halfplane. If we come to an intersection on £, then we 
use the sorted list of intersections to jump to the next intersection along £. The 
traversal stops when we come back to a visited vertex, and we have traversed 
the boundary of one connected component of the clipped z. Then we repeat 
the traversal from an unvisited vertex of z within the halfplane and so on until 
no such vertex is left. This traverses all connected components in the clipped z. 
Since each vertex of z and each intersection on £ is visited at most once, this also 
takes 0(|z|) time. This completes the first clipping. Each subsequent clipping 
is done the same way. Since we have added at most 0(|z|) new vertices after a 
clipping and there are four clippings, we conclude that each clipping takes 0(|z|) 
time. 
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After obtaining z Hv, we can then retain only the components in zC\v that 
has area at least 1/n and include them in L„. Repeating this for each region in 
Lu yields L^. The total time needed is then proportional to the sum of sizes of 
regions in Lu- Let Ei denote the set of edges on the boundaries of regions in 
Lu for all nodes u at level i of T{S). We claim that for any level i, the number 
of edges in Ei is 0{n). Since T{S) is constructed using midpoint split and each 
leaf cell has area at least l/2n, the number of levels in the tree is O(logn), and 
it follows that the preprocessing time is 0{nlogn). 

To see that there are 0{n) edges in Ei, note that there are two categories of 
edges in Ei. The first category consists of edges that lie on the sides of a cell, 
and the second category consists of edges that lie on edges of S. Observe that 
edges in the first category can be charged against edges in the second category, 
so we only need to show that the number of edges in the second category is 0{n). 
The second category can be divided into two groups. The first group consists of 
edges that are incident to a vertex of S inside a cell at level i, and the second 
group consists of the remaining edges. It is clear that the number of edges of the 
first group can be no more than the total degree of the vertices of S, which is 
0{n). To count the number of edges of the second group, first observe that the 
number of regions with area at least 1/n in cells at level i is at most n. Second, 
each such region can have at most four boundary edges that are not incident to 
a vertex of S inside a cell at level i. Thus the number of edges of the second 
group is at most 4n. Hence the total number of edges in Ei is 0{n), which is the 
desired claim. 



5 General Well-Behaved Distributions 



Given a planar subdivision S and a well-behaved distribution (P,Q), our main 
idea is that we can organize our point location data structure (the quadratic 
space version or linear space version) in Theorem 0 in the probability space. 
Then given a query point q, we locate the region z' in /pq(5) containing fpQ{q) 
and then return fpQ{z'). 

For the above strategy to work, there are several requirements. First, the 
X and y coordinates of fpQ{q) should be uniformly and independently chosen 
from [0, 1]. Second, fpqiS) is a planar subdivision, and each region has at most 
a constant of holes and the perimeter of each region is bounded by a constant. 
Third, if we were to apply Theorem 0 directly, we would require that /pg(5) be 
a polygonal planar subdivision, but this is usually untrue. Instead, we will map 
back and forth between the geometric and probability spaces using fpQ and fpg 
to construct and query our data structure in the probability space. We describe 
below how these requirements are satisfied. 

Let x'^ be the a;-coordinate of fpQ{q). The probability prob{x'^ < x') that x'^ 
is less than or equal to x' for some 0 < x' < 1 is equal to prob{P{xq) < x'), 
where Xq is the ^-coordinate of q. But prob{P{Xq) < x') = prob{xq < P~"^{x')) 
which is equal to P{P~^{x')) = x' by definition of P. So prob{x'q < x') = x' and 
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x'q is uniformly picked from [0, 1]. Similarly, we can show that the y-coordinate 
of Jpgiq) is uniformly picked in [0, 1]. 

Given two real numbers a, P € I, since P is strictly increasing, a < /3 iff 
P{a) < P{P) and equality holds exactly when a = p. Therefore, the left-right 
ordering of points by ^-coordinate in / x J is preserved in U after the mapping. 
A similar reasoning about Q shows that the above-below ordering of points by 
^-coordinate is also preserved. Also, a point p is on a line segment ^ in J x J 
iff /pq(p) is on /pg(^) in U. Thus, incidence relations in S are preserved in 
Jpq{S) and hence /pq( 5) is a planar subdivision of U. In Theorem ^ it is 
already assumed that each region in /pg(5) has at most a constant number of 
holes, and the perimeter of each region is bounded by a constant. 

We now deal with issue that /pg(5) may not be a polygonal planar subdivi- 
sion. In constructing our data structure in U , we need to perform two primitives. 
The first primitive is to determine whether a vertex lies above, below, to the left, 
or to the right of an orthogonal line. (This is needed in shrinking.) The second 
primitive is to compute the intersection between an (possibly curvy) edge and 
an orthogonal line segment. (This is needed in midpoint split and shrinking.) 
Since ordering and incidence relations are preserved, these two primitives can be 
provided by first going back to the geometric space, perform the computation, 
and map the result back to the probability space. Note that a vertical/horizontal 
line segment is always mapped to a vertical/horizontal line segment and vice 
versa. Also, for the second primitive, we would be intersecting JpgiO, which 
must be a line segment, with an orthogonal line segment in the geometric space. 
This can be done in constant time in the geometric space, and then we map the 
result using fpq to the intersection desired in the probability space. 

To see the correctness of our approach to answer a query, first observe that, 
by continuity of P and Q, a closed curve in J x J is mapped by fpq to a closed 
curve in U . Since ordering and incidence relations are preserved, a point p lies 
inside/on/outside a closed curve ^ in / x J iff fpq{p) lies inside/on/outside 
fpqiO- Thus, given a query point q in the geometric space and a region z' in 
fpq{S) containing fpq{q), Jpgiz') is the region in S containing q. In searching 
our data structure, we need to tell whether the query point fpq{q) in U lies 
above, below, to the left, or to the right of an orthogonal line or a curvy edge 
We have seen that this can be done for an orthogonal line. For a curvy edge 
we simply return the relation between q and fpgiOi which must be a line 
segment, in the geometric space. This establishes the correctness of our approach 
to answer a query. 

In all. Theorem Eholds assuming that each evaluation of the functions P, Q, 
P~^, and Q~^ takes constant time. 
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Abstract. We consider the following motion planning problem for a 
point robot inside a simple polygon P: starting from an arbitrary point 
s of P, the robot aims at reaching the closest point t of P from where the 
entire polygon P can be seen; the robot does not have complete knowl- 
edge of P but is equipped with a 360-degree vision system that helps 
it “see” its surrounding space. We are interested in a competitive path 
planning algorithm, i.e., one that produces a path whose length does not 
exceed a constant c times the length of the shortest off-line path (in this 
case, c X distance(s, t)); the constant c is called the competitive factor. 
In this paper, we present a new strategy that achieves a competitive 
factor of ~3.126, improving over a 4.14-competitive strategy of Icking 
and Klein and a 3.829-competitive strategy of Lee et al. Our strategy 
possesses two additional advantages: first, the first point reached from 
where the entire polygon P is seen is precisely the closest such point to 
the starting position s, and second, all the points of the path are directly 
determined in terms of s and of polygon vertices, which implies that an 
actual robot following the strategy is not expected to deviate much from 
its course due to numerical error. The competitiveness analysis is based 
on properties of the class of curves with increasing chords. 

Keywords: Motion planning, competitive algorithm, kernel, simple 

polygon, curve with increasing chords. 



1 Introduction 

The field of robot motion planning has received considerable attention during 
the 1980s, but research intensified in the late 1980s when technological advances 
allowed the autonomous function of robots. This, along with the need for au- 
tonomous robots to undertake tasks that may be dangerous for humans (areas 
polluted by chemicals, space exploration, etc.), led to a number of results pertain- 
ing to motion planning problems in unknown or partially known environments 
(see P] for a survey) . The general motion planning problem for an autonomous 
robot involves devising a strategy which can help the robot to get to a destina- 
tion point in an environment which is being “discovered” by means of a vision 
system (or tactile sensing in some early work). Most motion planning problems 
are being modeled as two-dimensional problems where the robot is a point mov- 
ing inside or around polygonal shapes. This is not really restrictive, as real-world 



M.M. Halldorsson (Ed.): SWAT 2000, LNCS 1851, pp. .8B7- R^ 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



368 



L. Palios 



problems can be reduced to this formulation by means of transformations of the 
geometric boundaries of the objects in the robot’s world (Minkowski sum, etc). 

Of course, one is interested in having strategies which guarantee that the 
path traveled by the robot up to its destination is no more than a constant times 
the length of the shortest path if the environment was completely known. Such 
strategies are called competitive and the ratio of the length of the actual path 

traveled over the length of the shortest path is called the competitive factor. In 
other words, the competitive strategies guarantee that the effort expended is not 
far from the optimal. Research results have indicated that finding competitive 
strategies for different motion planning problems exhibits varying degrees of 
difficulty (from obtaining constant competitive solutions to proving that finding 
a competitive solution is P-SPACE complete; see ID. !E|). 

In this paper, we consider the problem of planning the path of a robot inside 
a polygon from any given starting position to a point from where the entire 
polygon can be seen; in fact, the closest such point to the starting position 
is sought. The robot is equipped with a 360-degree vision system. This is the 
problem of reaching the kernel of a polygon, and is what a mechanical guard 
is called to solve in order to position itself so that it watches its territory. The 
problem has been considered by Icking and Klein who described a strategy to 
reach the closest point of the kernel at a competitive factor of ^5.48; a tighter 
analysis by Lee and Chwa 0 showed that the strategy is ~4.14-competitive. 
Icking and Klein also showed that no competitive factor less than -v/2 can be 
achieved. A different strategy with a competitive factor of ~3.829 was later 
described by Lee et al. (S|, while Lopez-Ortiz and Schuierer m improved the 
lower bound to ~1.48. Lopez-Ortiz and Schuierer also noted that the competitive 
factor of PI is not guaranteed for negative instances (i.e., when the polygon has 
empty kernel) and described a strategy that is guaranteed to work even in this 
case at a competitive factor of ^46.35. 

Our work contributes a new strategy for reaching the kernel of an unknown 
polygon P with nonempty kernel which achieves a competitive factor of ~3.126. 
The path consists of line segments and circular arcs whose total number is linear 
in the size of P. Our strategy is designed so that the robot walks into the kernel 
at precisely the point that is closest to the starting position; additionally, it 
has the advantage that any point of the course is determined by the starting 
position of the robot and vertices of P, and therefore an actual robot following 
the strategy is not expected to deviate much from its course due to accumulated 
numerical errors. The competitiveness analysis is based on properties of the 
class of curves with increasing chords Experimental results suggest that 
the strategy performs better than the theoretical competitive factor. (A similar 
strategy has been used in [Z] for motion planning in a street-polygon.) 

The paper is structured as follows. In Section 2 we review the terminology 
that we use throughout the paper, and in Section 3 we outline our strategy 
and state some of the properties of the resulting path. In Section 4 we establish 
the competitive factor of the strategy, and in Section 5 we conclude with final 
remarks and open questions. 
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2 Terminology 

A simple polygon is the region enclosed by a single closed non-self-intersecting 
polygonal line; thus, a simple polygon does not have “holes” in it. The set of all 
points p of a simple polygon P such that the line segment that connects p with 
any other point of P lies entirely in P is called the kernel of the polygon. If we 
define the inner halfplane of an edge as the closed halfplane which is defined by 
the edge and contains all the points of P in a sufficiently small neighborhood 
of the edge’s midpoint, then the kernel of P is equal to the intersection of the 
inner halfplanes of all the edges of P and is therefore convex. 

We will follow the terminology of Icking and 
Klein |^; we briefly summarize it in this para- 
graph. From its starting position s, the robot prob- 
ably does not see parts of the polygon P in which it 
stands; if the robot sees all of P, then s belongs to 
the kernel and the robot need not move. The hid- 
den portions of the polygon are called caves. Each 
cave is adjacent to a reflex vertex of P, whose very 
existence creates the cave; these reflex vertices are 
called constraint vertices (Figure 1). A cave (asso- 
ciated with a constraint vertex v) is characterized as either left if it lies to 
the left of the directed line jf, or right otherwise. By extension, we say that 
a vertex is a left constraint vertex if it is a constraint vertex associated with a 
left cave, and similarly for a right constraint vertex. In Figure 1, the vertices 
V and w are left constraint vertices, and the shaded regions next to them are 
the associated caves; the vertex u is a right constraint vertex. For each of the 
constraint vertices v, we define its inner halfplane with respect to the current 
position p as the closed halfplane which is delimited by the line pv and does not 
contain the corresponding cave. 

From its starting position, the robot may detect zero or more left caves and 
zero or more right caves. If the robot sees at least one left cave, the following 
lemma holds (see HH for a proof). 

Lemma 1. Suppose that from its starting position s in a simple polygon P the 
robot detects one or more left caves next to the constraint vertices l\, ..., Ik 
(k > 1). Suppose further that no left constraint vertex exists such that the closure 
of the complement of its inner halfplane contains all the left constraint vertices. 
Then, the kernel of P is empty. 

A similar lemma holds for the right constraint vertices. Therefore, if the condi- 
tions of Lemma Ehold, we need do nothing, since the polygon has empty kernel. 
Otherwise, there is a left constraint vertex such that the closure of the com- 
plement of its inner halfplane contains all the left constraint vertices and it is 
unique (if there are more than one vertices collinear with s then we choose the 
one farthest away from s); we call this vertex maximal left constraint vertex. In 
Figure 1, v is the maximal left constraint vertex. In a similar fashion, we have 
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the maximal right constraint vertex. It can be proven that in a polygon with 
nonempty kernel, the left and right constraint vertices are not “intermixed” and 
this is why in papers on this problem which assume polygons with nonempty 
kernel, figures show the left and the right constraint vertices all gathered on the 
left and on the right of the polygon boundary respectively. 

Crucial in the analysis of our strategy is the notion of a curve with increasing 
chords-, a curve has increasing chords if \ad\ > \bc\ for any four points a,b,c,d 
lying on the curve in that order {\pq\ denotes the length of the line segment 
connecting p and q). For a plane curve with increasing chords. Rote proved that 

Lemma 2. The length of a plane curve with increasing chords connect- 
ing two points a and b does not exceed ^ times the length of the line segment 
connecting a and b. 

We close this section with a well known geometric fact and another lemma. 

Fact 1. Consider a circle with diameter ab. Then, the angle apb of the triangle 
with vertices a, b, and p is less than, equal to, or greater than 7t/2 if p lies 
outside, on the boundary, or inside the circle, respectively. 

Lemma 3. Let C\ be a connected non-self-intersecting curve which does not 
intersect the line segment connecting its endpoints a and b, and C 2 a convex 
polygonal line with the same endpoints which lies in the region enclosed by C\ 
and the line segment ab. Then, the length of C 2 does not exceed the length of C\. 

Angle Notation: Since three points define two angles (which sum up to 27 t), 

in the following, the notation abc (where a, b, c are three non-collinear points) 
is meant to indicate the smallest of the two corresponding angles. 

3 The Strategy 

The basic motivation behind our strategy stems from 
the study of the simplest case, i.e., a single reflex ver- 
tex v whose incident edges are not both visible from 
the starting position s. Since the robot does not know 
the direction of the invisible edge e incident upon v, it 
does not know where the closest point t of the kernel 
might be. However, in all cases, t belongs to the semi- 
circle with diameter sv, assuming that the semicircle 
lies in the polygon P (Figure 2). So, it seems a good 
idea to follow this semicircle. 

Our strategy is based on this idea. Thus, the path of the robot consists of 
circular arcs and line segments; each circular arc belongs to a circle with diameter 
sp, where s is the starting position and p is a constraint vertex. This strategy 
makes the robot reach the kernel at its closest point to s0 

^ It must be noted that this strategy is not optimal for the simple case of a single 
reflex vertex; it yields a worst-case competitive factor of 7 t/ 2 ~ 1.57. See |SI, for a 
proof that the optimal competitive factor is ~1.212, and for a strategy achieving it. 




Figure 2 
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We first consider the one-sided case, where there are only left or only right 
caves; our strategy for the general case consists of applying the one-sided case 
strategy twice, first for the left caves until we see them all, and then for the right 
caves (if needed) . 



3.1 The One-Sided Case 

Without loss of generality, we consider the case where there are only left caves 
(the case where we have only right caves is similar). Until the robot sees all 
the left caves, there exist left constraint vertices and among them a maximal 
left constraint vertex, which may change as the robot moves. Initially, the robot 
finds the maximal left constraint vertex vq as seen from the starting position 
s and starts following the semicircle with diameter svq- The two fundamental 
cases that characterize the robot’s path are: 

1. A new maximal constraint vertex u is discovered. Then, the robot will start 
following the semicircle with diameter su (Figure 3: point a). Interestingly, 
the current location of the robot belongs to both semicircles. 

2. The cave next to the currently maximal constraint vertex u becomes visible. 
This implies that the second edge e incident upon u has become visible as 
well. Then, the robot at its current position, say, b, finds the new maximal 
constraint vertex. If no such vertex exists, then the entire polygon is visible 
and the robot has achieved its goal. If such a vertex exists — let it be v — 
and V is a constraint vertex just seen for the first time (for example, if v is 
the other endpoint of e), then we execute the previous case. The remaining 
possibility is if v is a constraint vertex that has already been seen, in which 
case the robot walks along the line segment bu trying to reach (if possible) 
the semicircle with diameter sv (Figure 3: points b and c). 

Note that it may be the case that the robot has to reach the currently 
maximal constraint vertex u in order to see the cave next to u. (This can 
only happen if u is the maximal left constraint vertex vq seen from s.) In 
this case, if there exists a new maximal constraint vertex w, w has to be 
a constraint vertex just discovered, for otherwise the polygon has empty 
kernel. The robot at u lies on or outside the semicircle with diameter sw, 
and it will try to walk along the line su away from s in an attempt to see 
the cave next to w. 

The above two cases do not take into account the fact that the robot may 
take advantage of what it has seen. Clearly, the kernel of the polygon P is a 
subset of the inner halfplanes of the edges of P and of the inner halfplanes 
of the constraint vertices. Since the robot seeks to locate the kernel, it seems 
reasonable that it should not leave the inner halfplane of any of the polygon 
edges or constraint vertices which it sees or has seen. To be able to do that, the 
robot maintains the free polygon which is the subset of P in which the robot 
may walk. Initially, the free polygon is the intersection of the inner halfplanes of 
the visible edges and the visible constraint vertices from the starting point s. As 
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Figure 3 Figure 4 



a new edge or a new constraint vertex becomes visible, the robot updates its free 
polygon by intersecting it with the corresponding inner halfplane. By requiring 
that the robot maintains the free polygon up to date and remains in it, we ensure 
that the portion of the polygon seen by the robot never decreases; at the same 
time, the free polygon keeps shrinking and when the robot reaches the kernel, 
the free polygon is precisely the kernel of P. Additionally, a left (resp., right) 
constraint vertex will remain so until both its incident edges become visible; it 
will not turn into a right (resp., left) constraint vertex, which might happen if 
the robot zig-zagged inside P. 

At any time during its trip, the robot lies at a point, say, p, on the boundary 
of the current free polygon and it can only walk in the free polygon, that is, in the 
wedge delimited by the lines supporting the free polygon edges that are incident 
upon p. Since the free polygon is defined as the intersection of halfplanes, the 
opening angle of this wedge does not exceed tt. Because the line supporting the 
edge to the left of p (with respect to the robot’s motion towards the interior 
of the free polygon) bounds the current free polygon from the left, we call it a 
left-bounding line; similarly, the line supporting the edge to the right of p is a 
right-hounding line. 

The following two cases complete the path planning strategy of the robot. 

3. The robot’s intended course leads or lies outside the free polygon. Then the 
robot walks along the boundary of the free polygon as close to the intended 
course as possible. In terms of left- and right-bounding lines, the robot walks 
along the left-bounding (right-bounding, respectively) line of the current free 
polygon if and only if the intended course leads to the left (right, respectively) 
of the free polygon. 

4. An edge that was not visible becomes visible. Then, the robot updates the 
free polygon by intersecting it with the inner halfplane of that edge. Note 
that this case has to be executed in case 2. 

An example is shown in Figure 4. It is important to note that the ending point 
h lies on the line supporting the edge which was seen last. Another important 
observation pertains to the way the value of the angle psvQ behaves, where p 
denotes the current position of the robot on its way from s to h, and vq is the 
maximal left constraint vertex as observed from s. In the most general case, 
the following behavior of the angle psvo is exhibited: it is initially 7 t/ 2, then it 
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decreases, potentially reaching 0 but not decreasing below 0 (sub-path from s 
to / in Figure 4), and then it increases (sub-path from f to h). (Note that the 
robot may walk along suo-) However, two special cases may arise: first, the value 
of pitio is always decreasing from s to h (for example, consider the case that the 
caves of both v and w of Figure 4 were visible at /), and second, the value of 
psFo is always non-decreasing. The latter case may occur if, due to clipping, the 
left-bounding line of the free polygon is farther to the right from the semicircle 
with diameter svq; in this case, the robot will not follow any of the semicircles 
defined by s and the maximal left constraint vertices. 

Lemma 4. Suppose that the angle psug decreases and then increases, reaching 
its minimum value when the robot is at the point x. Then, 

(i) X is either on or outside the corresponding semicircle, 

(a) the part of the robot’s path past x lies outside the semicircle defined by s and 
the currently maximal constraint vertex. 

3.2 The General Case 

Our strategy for the general case consists of applying the one-sided strategy 
twice, first for the left caves and then for the right caves. Suppose that the robot 
is at point h, when it finally sees all the left caves. Then, the robot finds the 
maximal right constraint vertex u and updates its free polygon by intersecting 
it with the inner halfplane of u at h. The robot’s intention is to walk along the 
semicircle (7s„ with diameter su; however, it has to reach Cgu first. To do this, 
the robot tries to walk along the line hu towards the semicircle; by walking in 
this direction, the robot does neither gain nor lose visibility of the cave next 
to u. Of course, this course is subject to clipping about the free polygon; so, if 
the path along hu towards Csu leads outside the free polygon, the robot follows 
left-bounding lines if h is inside smd right-bounding lines if h is outside Cgu- 
The final path consists of two sub-paths, one from s to h and the other from h 
to the final point t, each similar to the path shown in Figure 4. That is, each one 
of them consists of a number of clipped circular arcs and line segments (cases 1 
and 2 of Section 3.1), potentially followed by one or more line segments that 
result from clipping whenever the corresponding semicircles fall outside the free 
polygon (Figures 5-7 show examples of paths). Our observation in Section 3.1 
about the behavior of the values of the angle psvo (where p is the robot’s cur- 
rent position and vq is the maximal left constraint vertex as observed from s) is 
extended and implies that, in the most general case, psvo is initially tt / 2 , then 
decreases, potentially reaching 0 but not decreasing below 0, then it starts in- 
creasing assuming values up to (where Ug is the maximal right constraint 

vertex as observed from s), and then it may start decreasing again up to 0. 

3.3 Simulating the Strategy 

The obvious way to simulate a motion strategy involves starting at the prede- 
termined starting position and executing small steps applying the rules of the 
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strategy. This method has the obvious disadvantage that a good approximation 
of the robot’s path requires a large number of steps which may lead to increased 
execution time and large errors resulting from accumulated numerical errors at 
each step. 

A second approach is to split the given polygon P into regions in each of 
which the robot follows the same curve. Clearly, we will have to split P about 
the lines supporting the polygon edges incident upon reflex vertices. Moreover, 
we need to split P about lines that connect pairs of (left or right) constraint 
vertices that consecutively become maximal. To do that, we find the tree of 
shortest paths inside P from s to all the reflex vertices and we split P about 
the lines supporting the edges of this tree as well. Then, the robot can traverse 
any of the resulting regions in one computational step; the only computation 
in each region involves finding the points of intersection of the path with the 
region boundary. This method involves fewer steps compared to the previous 
one but it requires computing the partition of the polygon about the above 
mentioned lines; the total number of these lines is linear in the number n of 
polygon vertices. Building the partition requires 0{n^) space and it can be done 
incrementally in 0{v?) time in a fashion similar to the incremental construction 
of an arrangement of lines; see Pj and 0. The free polygon is maintained by 
turning on or off a bit associated with each region. 



3.4 Path Properties 

It is interesting to observe that every point of the robot’s path belongs either to a 
semicircle defined by the starting point and a vertex of the polygon P (a maximal 
constraint vertex) or to the line supporting an edge of P. This guarantees that an 
actual robot following our strategy is not expected to deviate from the intended 
course, as opposed to other strategies where this is possible because the motion 
of the robot is dependent on the current position. For example, in Icking and 
Klein’s strategy, the robot follows the bisector of an angle with apex the current 
position; but then, due to accumulated numerical error, the robot may deviate 
substantially from the expected course. 

Additionally, the following lemmata establish two important properties of 
the robot’s path (proofs can be found in fj). 

Lemma 5. The path resulting from the application of the above described strat- 
egy reaches the kernel of the polygon at the kernel’s point that is closest to the 
starting point s. 



Lemma 6. The path that the robot follows in accordance with our strategy con- 
sists of 0{n) line segments or circular arcs, where n is the number of vertices of 
the polygon P. 
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4 Competitiveness Analysis 

In order to compute the competitive factor of our strategy, we need to compute 
the worst-case ratio of the length of the path resulting from the application of 
our strategy over the length of the line segment connecting the starting point s to 
the ending point t. Obviously, the worst case scenario involves double application 
of the one-sided case. Our analysis relies on computing the competitive factor of 
an “augmented” path (we ignore (most of) the clipping) whose length is no less 
than the length of the actual path traveled. 

Before we describe the “augmentation” procedure, we review the important 
stops in the robot’s path and define the 1-path and r-path which will be used to 
augment the path. The robot first applies the one-sided strategy trying to see 
all the left caves; let h be the final point during this phase, that is, the point 
from where all the left caves are visible. Then, the robot applies the one-sided 
strategy again, for the right caves this time. As mentioned in Section 3.2, the 
angle psuo (defined by the current position p of the robot, the starting position 
s, and the maximal left constraint vertex vq observed from s) decreases, then it 
may increase and finally it may decrease again; let x and y be the turning points 
where these changes of monotonicity occur (if the robot walks along the line sx 
or sy, we let x and y be the closest such points to s). Note that x may coincide 
with h or may be before or after h along the robot’s path; y may coincide with t, 
although this is not true in the most general case. Moreover, as mentioned earlier, 
the point h lies on the line supporting the polygon edge that just became visible 
at h\ let Ih be that line. Then, Ih is a right-bounding line of the free polygon 
at h. Similarly, the ending point t lies on the line It supporting the edge that 
became visible last, and It is a left-bounding line of the free polygon at t. 

We define the l-path as the path that the robot would follow if it only applied 
cases 1 and 2 of Section 3.1 from its starting position s until it either saw all 
the left caves or reached the line sx, whichever came first; in the former case, we 
extend the 1-path by adding a line segment along the left-bounding line of the 
free polygon from the 1-path’s final point to the point of intersection with sx. 
Because clipping is ignored, this left-bounding line supports a polygon edge next 
to a maximal left constraint vertex; this edge is not necessarily the edge that 
became visible last. As a summary, the 1-path consists of a sequence of circular 
arcs (arcs sa, be of Figure 3) occasionally separated by a line segment along a 
line supporting an initially invisible polygon edge (segment be of Figure 3). We 
define the r-path similarly: this is the path that the robot would follow if it only 
applied cases 1 and 2 of Section 3.1 starting from s until it either saw all the 
right caves or reached the line sy; again, if the robot has seen all the right caves 
before it reached the line sy, we extend the r-path accordingly. We finally define 
the l-region as the closed region bounded by the 1-path and the line sx; similarly, 
the r-region is the closed region bounded by the r-path and the line sy. We note 
that: 

Observation 1. The point h from which all the left caves are finally visible does 
not belong to the interior of the l-region. Similarly, the final point t from which 
the entire polygon is visible does not belong to the interior of the r-region. 
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The robot tries to follow the 1-path and the r-path if possible, or otherwise 
stay as close to them as possible. On its course from the starting point s to h (the 
case is similar for the part from h to the ending point t), it follows (parts of) the 
1-path, may move outside the 1-region due to clipping about a left-bounding line 
(when the 1-path leads farther left than the left boundary of the free polygon) , or 
may move inside the 1-region due to clipping about a right-bounding line (when 
the 1-path leads farther right than the right boundary of the free polygon). In 
general, the robot may move in and out of the 1-region several times; after it 
has moved in, it may walk along several different right-bounding lines (tracing a 
convex curve inside the 1-region), whereas after it has moved out, it may follow 
several different left-bounding lines (tracing a concave curve outside the 1-region) . 
It is important to observe: 

Observation 2. The robot never follows a left-bounding line right after a right- 
bounding line (or vice versa) except at the point h where it sees all the left caves. 

The observation follows from the fact that the robot tries to stay as close to the 
corresponding semicircle as it can and if this is farther left (right, respectively) 
than the left (right, respectively) boundary of the free polygon, the robot will 
keep following the left (right, respectively) boundary of the free polygon until it 
reaches it, if ever. 

4.1 Augmenting the Robot’s Path 

Now we are ready to see how the actual robot’s path is being augmented; we 
will also define the points x' and y' which will be crucial in partitioning the 
augmented path into curves with increasing chords. We concentrate on the most 
general case in which x ^ s (i.e., the angle psif) starts by decreasing) and x ^ t] 
the special cases where x = s or x = t yield smaller competitive factors (see 
HH). Note that y may or may not coincide with t. 

1. the part of the robot’s path from s to x: We recall that x may be either 
on the 1-path or outside the 1-region; in the latter case and if additionally 
h coincides with or is reached after x, then the robot has been walking 
along left-bounding lines from the last point of its course on the 1-path 
up to X. Recall also that h is either on the 1-path or outside the 1-region 
(Observation 1); if it is outside the 1-region, then again the robot has been 
walking along left-bounding lines. In all cases where the robot walks along 
left-bounding lines after it leaves the 1-region (no matter whether h is reached 
before or after x), the sub-path from s to a: is augmented by considering the 
entire 1-path, followed by a line segment from the final point of the 1-path to 
X along sx (Figure 5); this includes as a special case the case where x belongs 
to the 1-path. It remains to consider the cases where the robot walks along 
right-bounding lines. There are two cases to consider: first, h belongs to the 
1-path, X is reached after h, and the robot walks along a right-bounding line 
past h towards x, and second, h is outside the 1-region, x is reached after 
h, and the robot walks along a right-bounding line past h towards x; both 
cases imply x = t and yield smaller competitive factors (see [II 1)1. 
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2. the part of the robot’s path from x to y: We distinguish two cases depending 
on whether the robot walks along left- or right-bounding lines past x. 

(i) the robot walks along a left-bounding line past x. If the point h is before 
X or coincides with x, then x must belong to the r-region for the robot to 
follow a left-bounding line past x. We let x' be the point of intersection of 
sx with the r-path, and we augment the sub-path from a; to y by consider- 
ing the line segment xx' , followed by the r-path up to its intersection with 
the line sy, followed by the line segment from that point to y (Figure 6). 
If the point h is after x, then past h the robot may walk along a left- or a 
right-bounding line depending on whether h belongs to the r-region or not. 
Let q be the point of intersection of the lines sx and l^. If h is outside the 
r-region, or if h belongs to the r-region but q does not, we set x' = q and we 
augment the path by considering the line segment xx' (along sx), followed by 
a line segment along Ih from x' to the point of intersection with the r-path, 
followed by a line segment from that point to y along sy (Figure 7). If both 
q and h belong to the r-region, then we let x' be the point of intersection of 
the line sx with the r-path, and the sub-path from a; to j/ is augmented by 
considering the line segment xx' (along sx), followed by the r-path from x' 
to its final point on the line sy, followed by the line segment from that point 
to y (along sy); the situation is similar to the one depicted in Figure 6. 

(ii) the robot walks along a right-bounding line past x. Then, h cannot be 
before x, for, if h were reached before x, the robot must have been walking 
along right-bounding lines from h to x; this implies that x = t, a, contradic- 
tion to the continuation of the path past x. Moreover, h cannot be after x 
either; if h were reached after x, then h would be outside the 1-region and 
the robot would be walking along left-bounding lines from x to h. Therefore, 
h = X, and we set x' = h. Additionally, h lies outside the r-region (otherwise, 
the robot would not be following a right-bounding line past x). Let q be the 
point of intersection of Ih with the r-path (if Ih intersects a line segment of 
the r-path, then q is the point of intersection of Ih with the immediately 
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following semicircle); if the line Ih does not intersect the r-path, we let q be 
the point of intersection of and sy. Then, the sub-path from a; to j/ is 
augmented by considering the line segment xq along Ih, potentially followed 
by the r-path from q to its intersection with the line sy (if q does not belong 
to the line sy), followed by the line segment from that point to y. 

3. the part of the robot’s path from y to the final point t: li y = t, then we set 
y' = y = t.li y ^ t, then the path past y lies outside the corresponding semi- 
circles (Lemma 0 and the robot on its way to t walks along right-bounding 
lines only. So, this part of the actual path is augmented by considering the 
polygonal line formed by the segments yy' and y't, where y' is the point of 
intersection of the lines sy and It (Figure 6). 

It is important to observe that the augmented path does not cross itself. More- 
over, the augmented path proceeds along or to the left of the left-bounding lines 
that the robot follows, and along or to the right of the right-bounding lines, thus 
enclosing the actual robot’s path. Therefore, we have: 

Observation 3. The path traveled by the robot and the augmented path have the 
same endpoints. 

Observation 4. The path traveled by the robot can be produced by clipping the 
augmented path about the edges of a (shrinking) convex polygon. 

4.2 The Competitive Factor 

With respect to the points x' and y' , the augmented path can be seen as the 
concatenation of three sub-paths, one from s to x' , one from x' to y' , and one 
from y' to the final point t. The sub-path from s to x' consists of circular arcs 
occasionally separated by a line segment along a line supporting an initially 
invisible polygon edge (cases 1 and 2 of Section 3.1), potentially ending with a 
line segment along the line sx. The sub-path from x' to y' consists mainly of arcs 
and line segments (in accordance with cases 1 and 2 of Section 3.1) as well, but 
may begin with a line segment along Ih, and may end with a line segment along 
the line sy; the sub-path may degenerate into a two-segment polygonal line, one 
along Ih and the other along sy. Finally, the sub-path from y' to t is simply a 
line segment. See Figures 6-7. More importantly, the following lemmata hold. 

Lemma 7. The (counterclockwise) angle sx'y' is at least equal to 7t/2. 

Proof. The definition of the point x' in the case 2 of the preceding section sug- 
gests that we need to consider two cases. First, suppose x' is the point of inter- 
section of sx with the r-path (if x is inside or on the boundary of the r-region). 
Then, x' lies on the semicircle of the currently maximal right constraint ver- 
tex (case 1 of Section 3.1), or on the line supporting an edge incident upon a 
right-constraint vertex which was initially invisible and became visible (case 2 
of Section 3.1); in the latter case, x' lies inside the semicircle associated with the 
right constraint vertex. In either case, if w is the right constraint vertex, then 
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the angle sx'w is at least equal to tt/2. Moreover, the point y and (a fortiori) the 
point y' lie on or to the left of the directed line x w (see Figure 6). Therefore, 
sx'y' > sx'w, and the lemma follows. 

Suppose now that x' is the point of intersection of sx with this case occurs 
if h is reached after x, or x' = h and h lies outside the r-region. In either case, 
x' coincides with or is farther away from s than x; since x is on the boundary 
or outside the 1-region ( Lemma x' lies on or outside the semicircle of the last 

maximal left-constraint vertex, say, v. Then, sx'v < t:I2 (Fact 1). The lemma 
follows from the fact that y and (a fortiori) y' belong to the inner halfplane of 
Ih and thus the angle sx'y' is at least equal to tt — sx'v (see Figure 7). | 

Similarly, 

Lemma 8. If y ^ t, the (clockwise) angle sy't is at least equal to 7^12. 

Lemma 9. The sub-path of the augmented path from s to x' is a curve with 
increasing chords. 

Proof. For any point p (other than s), we define the quadrants Ap, Bp, Cp 
and Dp at p as the four closed quadrants determined by the line sp and its 
perpendicular at p: the quadrant Ap is the quadrant that contains s and lies to 
the right of the directed line sf>, while the other quadrants Bp, Cp and Dp follow 
quadrant Ap in counterclockwise order around p. We first prove that for any 
point p of this sub-path, the part of the augmented path from s to p belongs to 
the closed quadrant Ap of p, while the part of the path from p to x' belongs to 
the closed quadrant Cp of p. One needs to consider the different cases for p: on 
a circular arc, at the intersection of two arcs, at the intersection of an arc and 
a line segment, on a line segment. This follows from the fact that for any point 
g of a semicircle with diameter ab, the angle aqb is equal to tt/2 (see Fact 1). 
(Figure 8 gives some examples for illustration purposes; the crosses indicate the 
lines delimiting the quadrants.) Next, we consider 4 points a, b, c and d in that 
order along the augmented path. We draw the corresponding quadrants for the 
points b and c and draw the two lines lb and C perpendicular to be that pass by 
b and c respectively (Figure 9). Since c belongs to the quadrant Cb of b, lb lies in 
the closure of the wedge defined by the quadrants Bb and Db of b. Similarly, since 
b belongs to the quadrant A^ of c, Ic lies in the closure of the wedge defined by 
the quadrants B^ and D^ of c. Moreover, the point a lies in the quadrant Ab of 
b, that is, to the left of lb. Similarly, the point d lies in the quadrant Cc of c, that 
is, to the right of C. Therefore, the length of ad is no less than the perpendicular 
distance of lb and C, which by construction is equal to be. | 

In a similar fashion, although with a little more effort because the sub-path 
of the augmented path between x' and y' may begin with a line segment, we can 
prove: 

Lemma 10. The sub-path of the augmented path from x' to y' is a curve with 
increasing chords. 
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Figure 8 




From the above, we conclude 

Theorem 1. Our strategy has a competitive factor of -^2(271/3)^ + 1 ~ 3.126. 

Proof. Clearly, the length of the actual path traveled by the robot is no more 
than the length of the augmented path, as clipping with convex polygonal lines 
or curves leads to reduced path length (Observation 4 and 
Lemma 01 ). So, an upper bound on the ratio of the length 
of the augmented path over the length of the line segment 
st readily implies an upper bound on the competitive factor 
that we seek. Figure 10 shows the skeleton of the augment^ 
path in the worst case; the angles a = sx'y' and /3 = sy't 
are at least equal to tt/ 2 (Lemmata 0 and 0. Let us denote 
by \pq\ the length of the path from p to g as opposed to \pq\ 
which denotes the length of the line segment pq. Then the 
competitive factor r is 



Isa'I + \x'y'\ + \y't\ 
\st\ 



< 






mx'y'\ 



\y't\ 




|st| 



Figure 10 



since the augmented sub-paths from s to x' and from x' to y' are curves with 
increasing chords (Lemmata 0 and EJ and therefore their lengths are not more 
than 27 t/ 3 times the lengths of the line segments sx' and x'y' respectively 
(Lemma 0). If we apply the law of sines in the triangles sx'y' and sy't, fac- 
tor out the length |si/'|, and maximize using partial derivatives, we find 



sin 7 + sin(g-i- 7 ) sm{p+5) 2^ sinQS+s) 

_3 shto^ sin^ < _3 ^ sin^ < + 1 

sm /3 “ sinp — V V II 

sin<5 sin5 



where tt/ 2 < a < tt, tt/2 </3<7t, 0<7<7r — a and 0 < 5 < tt — /3: the term 
sm 7 -^s^(a-i- 7 ) .g decreasing as a. increases and is thus maximized for a = 7t/2 
and 7 = tt/4; similarly, the overall fraction is maximized for /3 = 7t/2. | 



5 Concluding Remarks — Open Problems 

We presented a strategy which enables a point robot to reach the point t of the 
kernel that is closest to the starting point s, and guarantees that the length of 



Competitive Kernel-Searching 381 



the path traveled is not longer than 3.126 times the length of the line segment 
st (that is, 3.126 times the shortest possible off-line path). Our strategy has 
the interesting feature that the robot reaches the kernel at precisely the closest 
point t. We note that the above competitive factor cannot be guaranteed when 
the polygon has empty kernel (in such cases, the competitive factor is defined 
as the ratio of the length of the path that a strategy imposes over the length of 
the shortest path which establishes that the kernel is empty), and this holds for 
all strategies where a point of the polygon seen by the robot never ceases to be 
in the robot’s visible region thereafter (enforced by means of the free polygon in 
this work, and by means of the gaining and keeping wedges in and 0). 

Experimental results seem to suggest that the actual competitive factor is 
smaller than the theoretical competitive factor of 3.126. If true, it would be 
interesting to come up with tighter theoretical bounds on the competitive factor 
of our strategy. Of course, the ultimate open question is to invent strategies with 
smaller competitive factors which will close the gap between the current upper 
bound of ^3.126 and the lower bound of ~1.48. To this effect, perhaps ideas like 
the ones in may be of help. 

Finally, better competitive solutions are needed for other motion planning 
problems in unknown environments. Lopez-Ortiz and Schuierer m have ad- 
dressed two interesting problems in this class: finding out whether a given poly- 
gon is star-shaped (i.e., it has non-empty kernel), and locating a target (to be 
recognized when seen) in a polygon with non-empty kernel. The currently best 
competitive factor for the first problem is 46.35. The currently best competitive 
factor for the second problem is 12.72 and is coupled with a lower bound of 9. 
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Abstract. The double digest problem is a common NP-hard approach 
to constructing physical maps of DNA sequences. This paper presents a 
new approach called the enhanced double digest problem. Although this 
new problem is also NP-hard, it can be solved in linear time under a 
certain restriction, which is satished reasonably frequently. 

Key words. DNA physical mapping, fast algorithms, graph-theoretic 
techniques, NP-hardness 



1 Introduction 

The physical mapping of DNA is a key problem in computational biology 0 . A 
map of a DNA sequence consists of the locations of some given small sequences 
like e.g. GAATTC. Biologists use such maps in a preparatory step to determine 
the target DNA sequence |^. 

A common technique of constructing maps uses restriction enzymes to cut a 
DNA sequence at the positions where a particular short DNA sequence appears. 
These positions are called restriction sites. One approach to modeling map con- 
struction is the double digest (DD) problem. Given two restriction enzymes A 
and B, this approach cuts a target DNA sequence using enzyme A, enzyme B, 
and both enzymes, separately. It is a biology fact that the restriction sites for 
enzymes A and B do not coincide. Throughout this paper, we make use of this 
fact. Let A, B and C be the three multisets of the lengths of the fragments 
formed after applying enzyme A, enzyme B and both enzymes to the target 
DNA sequence, respectively. Given A, B and C, the DD problem asks for per- 
mutations of the lengths in A and B such that if these sets of lengths are plotted 
on top of one another, the lengths of all the resulting subintervals formed due 
to overlapping match exactly the lengths in C. See Figure ^ for an example. 

Many algorithms mini have been proposed for the DD problem. Stefik 
P] gave the first algorithm using artificial intelligence. Fitch, Smith and Ralph 
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Fig. 1. Stripes (a), (b) and (c) show the fragments resulting from the applications of 
enzyme A, enzyme 13 and both enzymes, respectively. In strip (c), the subfragments 
are created due to the overlapping between fragments in (a) and those in (b). 



U reduced the DD problem to the set partition problem. Goldstein and Water- 
man PI approached this problem with a stochastic annealing heuristic for the 
traveling salesman problem. They also showed that the DD problem is NP-hard 
by reducing the set partition problem to it. 

This paper suggests a new approach, called the enhanced double digest (EDD) 
problem. The EDD problem uses A, B, C and some additional length informa- 
tion; see Section 0for the details of the approach. Although the EDD problem is 
still NP-hard, we show that if the lengths in C are all distinct, it can be solved 
in linear time. We also generalize the algorithm for the case where the number 
of duplicates in C is bounded by a constant. The time complexity of this gener- 
alized algorithm remains linear. Based on preliminary analysis, these constraints 
on duplicates in C can be satisfied with a reasonable probability. 

Section 0 details the new approach to define the EDD problem formally. 
Section 0 gives the linear-time algorithm for the case where C is duplicate-free. 
Also, it generalizes the algorithm to handle a small number of duplicate lengths. 
Section 0 proves that the EDD problem is NP-hard. Section 0 concludes with 
some directions for further work. 



2 Problem Formulation 

Consider a target DNA sequence and two restriction enzymes A and B. 

— By applying enzyme A (respectively, B) to the target DNA sequence, we 
obtain p (respectively, q) fragments. Let A = {ai,...,ap} (respectively, 
B = {bi, . . . ,bq}) be the multiset of the lengths of these p (respectively, 
q) fragments. 

— For i = 1, . . . , p, let Oi be the fragment corresponding to We apply enzyme 
B to the fragment and obtain a set of subfragments. Let ABi be the 
multiset of the lengths of these subfragments. 

— For j = 1, . . . , g, let bj be the fragment corresponding to bj . We apply enzyme 
A to the fragment bj and obtain a set of subfragments. Let BAj be the 
multiset of the lengths of these subfragments. 
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For the example in Figure Q the following length information is gathered: 

• A = {oi = 9 , 02 = 12 , 03 = 15 , 04 = 17 , 05 = 37}; 

B = {&i = 6 , 62 = 38,63 = 46}; 

• ABi = {3, 6 }; AB 2 = {12}; AB 3 = {15}; AB^ = {17}; AB^ = {8, 29}; 

• BAi = { 6 }; BA 2 = {3, 8, 12, 15}; BA^ = {17, 29}. 

It is easily verified that the data found in this way has the following properties: 

Fact 1. 

1. Fori = l,... ,p, at = c- For j = 1, . . . , q, bj = c. 

\J^AB, = \J^BA^=C. 

3. \C\ = \A\ + iB\-l. 

Proof. Straightforward. 

Given A, B, ABi , . . . , ABp, BAi , . . . , BAq, the enhanced double digest prob- 
lem V asks for a valid permutation (ttatT^b) of the elements in A and B such 
that the following can be achieved. When the fragments di for Oi S A and bj 
for bj G B are plotted on the same line according to the order given by tta and 
7Tb, a set of subfragments is formed due to overlapping. The multiset C of the 
lengths of these subfragments is required to be equal to VJ^^^ABi = VJj^^BAj. 
In addition, 

— for every G A (respectively, bj G B), ABi (respectively, BAj) is equal 
to the multiset of the lengths of the subfragments which overlap with 'di 
(respectively, bj). 

Note that an instance of this problem may have no solution or more than 
one valid permutation. The algorithms given in Section 0 can recover all valid 
permutations, if any exists. 



3 An Efficient Algorithm 

Unless otherwise stated, this section assumes that C has no duplicates. Let 
n = \C\. This section shows that the FDD problem V can be solved in 0{n) 
time. 

Section n. II formulates the EDD problem as a graph problem. Section 
describes the linear-time algorithm. Section 1^31 discusses how to generalize this 
linear-time algorithm to the case where C may contain a small number of dupli- 
cates. 

3.1 A Graph Representation 

Given A, B, ABi , . . . , ABp, BA \, . . . , BAq, we construct an undirected graph G 
as follows. 
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A 9 12 15 17 37 




6^ 6”^ — 9'^ — 3'^ — 38^ — 8”^ — 37'^ — 29'=' — 46^ — 17'= 17=^ 




12'= 15'= 



12 =^ 15 =^ 



(b) 



Fig. 2. The graph G in (a) is constructed from the example in Figure ^ G can be 
redrawn into a spanning tree as shown in (b). The superscript A, B or G of each node 
denotes whether the node belongs to A, B or G. 



— The node set of G = A U i? U C. 

— For every ai € A and every x G C, {ui, x) G G if x G ABi. 

— For every bj G B and every x G C, (bj,x) G G if x G BAj. 

From the definition , we can observe that G satisfies the following lemma. 

Lemma 1. G is connected. For each node in AU B, its degree is at least 1 and 

it is adjacent to nodes in G only. Also, every node in G connects to exactly one 

node in A and one node in B. 

Proof. Straightforward based on the assumption that G has no duplicates. 

If V has a valid permutation, G has two more properties as stated in 
Lemma 0 Figure 0 illustrates an example. A diameter of a tree is a path with 
the largest number of edges. A dangler is a 2-node-long path. Given a tree T, a 
subtree r of T is said to be hanged on a path P in T if r is a tree in the spanning 
forest T — P. 

Lemma 2. IfV has a valid permutation, then the following statements hold. 

1. G is a spanning tree. 

2. For any diameter S of G, the subtrees hanged on S must be danglers. 
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Proof. 

Statement E To prove by contradiction, suppose that G contains a cycle D. 
By the construction of G, D must be of the form 

®ii ) Cfc J , Cfcj , 0^2 , C/C3 , , Cki ) • • ■ ) Cfcji ) , 

where n = f^+i; e A; , . . . , S B; and Ck^,..., Ck^^ G C. 

By definition, if Ui,Ck, bj is a path in G, then and bj overlap by Ck in any 
valid permutation of V. Thus, for 1 < ^ < 2 : — 1, the existence of the subgath 
Gig , of Z? in G means that bi^ overlaps with and and bi^^^ 
overlaps with and Gif^^^- To enable both hi^ and overlap with 

must be in the middle of Gi^ and ai ^^2 for 1 < £ < z — 1. Consequently, for 
1 < £ < z — 1 , Oij is in the middle of and which is impossible. 

Statement El For any diameter S of G, we show that every subtree r hanged 
on S must be a dangler. First, r must be hanged on S' at a node in AU B. 
Otherwise, if r is hanged on S at a node c € G, c has degree greater than 2, 
contradicting Lemmas Then, r has more than one node because the root of r 
is a node in G and must be of degree 2. If r cannot have more than 2 nodes. 
Statement El follows. 




Fig. 3. In this example, all Oi G A, bj G B and Ck G C. 



To prove by contradiction, suppose that r has more than two nodes. Without 
lost of generality, assume that r is hanged on S at a node Gi„ G A and the root 
of r is a node Ck^ G G. Note that Ck^ has another neighbour, say bj^, from B. 
If T contains more than two nodes, bj^ must has a child, say Ckg, from G and 
Cke must has a child, say 0 ^ 3 , from A. Thus, r must have a root-to-leaf path 
of length more than 4. Then, the two paths from Gig to both ends of S must 
be of length more than 4. Otherwise, S cannot be a diameter of G. From those 
observations, G has the pattern shown in Figure El According to the pattern, 
bji,bjg and bj^ overlap with Gig. Therefore, in any valid permutation, one of 
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bji , bj^ and bj ^ , say bj ^ , must be in the middle of the other two fragments and 
bj 2 can only overlap with ai^. However, according to the pattern in Figure 0 for 
I = 1, 2, 3, bj^ overlaps with another fragment reaching a contradiction. 

Now, we know that if V has a valid permutation, G satisfies the two prop- 
erties of Lemma El The remainder of this section show that the converse of this 
statement is also true. Suppose that G is a spanning tree with a diameter S such 
that all the subtrees hanged on S are danglers. We define to be a permutation 
on G formed by a search defined below. 

Dangler-first search: Traverse G starting from one end of S to the other end of 
S; read off the nodes in G on 5; whenever meet any node x with degree 
greater than 2, read off the nodes in G in the danglers hanged on S' at a: in 
any order and continue to traverse S. 



Lemma 3 . The elements in each ABi form a consecutive subsequence in 'kq- 
Similarly, the elements in each BAj form a consecutive subsequence in ttc- 

Proof. For each i, if ABi contains only one element, then the lemma follows. 
Otherwise, Ui is of degree at least 2. Then, must be on the diameter S. Let 
c and c' be elements in ABi which are the two neighbours of on S. The 
remaining nodes in ABi must be located in the danglers hanged on S at a^. By 
dangler-first search, all the elements in ABi must form a consecutive subsequence 
in TTc- By symmetry, for each j, the elements in BAj must form a consecutive 
subsequence in irc- 

By Lemma 0 ttc can be partitioned into p subintervals such that the rth 
interval contains the elements in ABi,, for r = 1, ... ,p. Let tta be the permu- 
tation (oij, . . . , flip). Similarly, ttc can be partitioned into q intervals such that 
the sth interval contains the elements in BAj,, for s = 1, . . . , g. Let ttb be the 
permutation {bj.^, ... , bj,^). We call (tt^, tt^) the induced permutation of ttc. 

Lemma 4 . The induced permutation (,tta,'Xb) of ttc is a valid permutation of 

V. 

Proof. Suppose the lengths from A, B and G are plotted on the same line ac- 
cording to the order given by ttai and ttc, respectively. Consider the stripes 
formed from A and G. By FactEand LemmaOl for each i, di overlaps with c for 
all c G ABi. By symmetry, for each j, bj overlaps with c for all c G BAj. Then, 
by the definition of the FDD problem, (tta,ttb) is a valid permutation. 



Theorem 1 . Given the enhanced double digest problem V and its corresponding 
graph G, V has a valid permutation if and only if G satisfies the two properties 
in Lemma H 

Proof. The only-if part follows from LemmaEl The if part follows from LemmaEl 
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3.2 A Linear-Time Algorithm for a Duplicate-Free C 

This section describes how to compute a valid permutation of V in 0{n) time. 
The algorithm is as follows. 

Algorithm Enhanced-Double-Digest 

1. Construct the graph G corresponding to V. 

2. If G does not satisfy the two properties in Lemma El then return “no valid 
permutation” . 

3. Find the permutation ttc using dangler-first search. 

4. Find the induced permutation of ttc. 

5. Return 

Lemma 5. Algorithm Enhanced-Double-Digest can correctly find a valid per- 
mutation in 0 {n) time. 

Proof. First, by Lemma 21 and Theorem Fnhanced-Double-Digest is correct. 
As for its time complexity, Step 1 requires 0(n) time as G contains 2n edges 
and we can find each edge in 0(1) time. Step 2 checks whether G satisfies the 
two properties in Lemma E| For property ^ we can determine whether a graph 
is a spanning tree in 0(n) time. For property El we can compute a diameter 
of a tree in linear time first, then, we verify whether G satisfies property El 
by detecting whether the subtrees hanged on the diameter are danglers. Thus, 
Step 2 requires 0{n) time. Step 3 finds ttc using dangler-first search . Since the 
search scans every node in G once, it runs in 0(n) time. Step 4 finds the induced 
permutation (tca,t^b) of ttc in 0(n) time. In summary, a valid permutation of 
V can be computed in 0 {n) time. 

To get all valid permutations of 7^, we can modify the dangler- first search to 
return all possible permutations ttc in a straightforward manner. The induced 
permutations (tca,t^b) of all such ttc scce all valid permutations of V. 

3.3 A General Algorithm for C with Few Duplicates 

The algorithm Fnhanced-Double-Digest in Section |^| can solve the EDD prob- 
lem if G contains no duplicates. Here, we give an algorithm which works without 
this assumption. 

First, we consider the following example. 

• A = {ai = 18, 02 = 19}; B = {bi = 4, &2 = 5, &3 = 7, &4 = 8, 65 = 13}; 
.ARi = {5,6,7};AH2 = {4,7,8}; 

• BAi = {4}; BA 2 = {5}; BA 3 = {7}; BA^ = {8}; BA^ = {6, 7}. 

In this example, there are two 7’s in G = UiABi = UjBAj. These two 7’s 
in fact represent two different subfragments in the target DNA sequence. To 
distinguish them, let the copy of 7 in ABi be 7i and that in AB 2 be 72. Since 
7 also belongs to BA^ and HA5, there are two possible combinations, namely. 
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(a) 7i G BA^ and ?2 G BA^ and (b) 7i G BA 3 and 72 G BA 5 . Figure S^a) 
and Hb) illustrate the graph G for both cases; from these two graphs G, we 
can obtain a valid permutation from combination (a). Therefore, we can handle 
duplicates in C by giving them different subscripts. Then, all the elements in 
G are different and we can solve the enhanced double digest problem using the 
algorithm Enhanced-Double-Digest in Section l^.'A More precisely, we have the 
following algorithm. 

1. If C contains duplicates, then we assign a unique subscript to each duplicate. 

2. For each possible combinations of the subscripts in the duplicates, we execute 

Enhanced-Double-Digest to compute a valid permutation. 

Let i be the number of duplicates in G. The above algorithm execute 
Enhanced-Double-Digest for at most £! time. Therefore, a valid permutation can 
be computed in 0{£ln) time. Thus, if £ is constant, the generalized algorithm 
still runs in linear time. 

4 The Enhanced Double Digest Problem Is NP-hard 

This section proves the NP-hardness of the enhanced double digest problem by 
a reduction from the Hamiltonian Path problem |2| . 

Given an undirected graph H, we show that in polynomial time, we can 
construct an EDD instance Q so that H contains a hamiltonian path if and only 
if Q has a valid permutation. For ease of prove, we augment H with two new 
nodes t and z. All nodes originally in H have edges to t. In addition, we add an 
edge (t, z) to H . Note that the original H contains a hamiltonian path if and 
only if the amended H has a hamiltonian path. Let £ be the number of nodes 
in H. Assume that the nodes in H are labeled by {1,2, . . . ,£}. For each node v, 
let k{v) be the number of neighbours of v. Let v’ = v + £. The EDD instance Q 
is given the following length information. Note that this length information can 
be constructed from H in polynomial time. 
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- A = {ay \ V & H} where = t', at = t + = v + 

Y.{u,v)eH'^' foi' z,t- Also, AB^ = {t'}-, ABt = {u' \ u& 

and ABy = {u' \ {u, v) € ffj U {t>} for v ^ z. 

- B ^ {by, by(i),. . . , by(k(y)-i) fvGff- {z}} whoro by = v + v' and 6„(i) = v' 

for all f € iJ — {z} and all i < k{v) — 1. Also, BAy = {r;, v'} and BAy(^i) = 
[v'}. 



Lemma 6. H has a hamiltonian path if and only if there is a valid permutation 
for Q. 

Proof. The two directions are proved as follows. 




{v' I (v, «i) £ H,V ^ s} {v’ I {v,U2) £ H,v^ «i} {P \ (v, t) £ H,v ^ «<-2} 



Fig. 5. The permutations tta and tts of A and B, respectively. 



Let ui,U2 , . . . , Ui-2, t,zhe a, hamiltonian path in H. Let tta and ttb be 
permutations of A and B as shown in Figure^ It is easy to check that {tta, t^b) 
is a valid permutation to Q. 

{^^=) Let {tta, t^b) be a valid permutation of Q. The remainder of this proof 
shows that the ordering of the lengths in tta defines a hamiltonian path in H . 

Assume the lengths from A are plotted on a line according to the order given 
by IT A and similarly, the lengths from B are also plotted on this line according 
to 7 Tb. For each v G H, the line fragment corresponds to o„ G A is called dy. For 
each V G H — {z}, the line fragment corresponds to G i?, is called by. 

For every v G H — {z}, since BAy = {r;, r;'}, by overlaps with two consecutive 
line fragments from A; in addition, the overlapping regions between by and these 
two line fragments must be of length v and v', respectively. Observe that v G ABy 
and V ^ ABy for aA v. One of these two fragments, which overlaps with by, 
must be dy. The other line fragment can be dy for any u G H with v' G ABy, 
i.e., {v, u) G H. 

Let tta = (okj, . . . ,ayf). From the above argument, we know that, for every 
two consecutive line fragments di and a^+i, there exists a fragment by (where v is 
either Ui or Mi+i) which overlaps with both and The above argument 

also implies that {ui, Mi+i) G H. Thus, u\,. . . ,ui forms a path vaH. Asu\,. . . ,ui 
contains all the ^ nodes of H, this path is a hamiltonian path. 
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5 Further Research Directions 

This work can be extended in several directions. One direction is to design a series 
of laboratory procedures that can actually produce the input length information 
in the required form. While separating out DNA sequences by length seems 
to be possible with current laboratory techniques, we still face the problem of 
separating different DNA fragments having the same length. Another direction 
is to consider the problem of more than 2 digesting enzymes. Using multiple 
enzymes could help resolve the issue of multiple solutions that arise when there 
are danglers or duplicate subfragment lengths. Also, the extra input may actually 
make the problem solvable in a shorter period of time. The third direction is to 
have a probabilistic analysis of the number of duplicates in C, when the length 
of the target DNA sequence is given. Lastly, this paper does not address the 
issue of noise in the length data. From the practical point of view, handling the 
noise problem is quite important. 
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Abstract. In molecular biology, it is said that two biological sequences 
tend to have similar properties if they have similar 3-D structures. Hence, 
it is very important to find not only similar sequences in the string sense, 
but also structurally similar sequences from databases. In this paper, we 
propose a new data structure that is a generalization of a parameter- 
ized suffix tree (p-suflix tree for short) introduced by Baker. This data 
structure can be used for finding structurally related patterns of RNA or 
single-stranded DNA. Furthermore, we propose an 0(n(log ILII-l-log |i7|)) 
on-line algorithm for constructing it, where n is the sequence length, | LII 
is the size of the normal alphabet, and | J7| is that of the alphabet called 
“parameter,” which is related to the structure of the sequence. Our al- 
gorithm achieves a linear time when it is used to analyze RNA and 
DNA sequences. Furthermore, as an algorithm for constructing the p- 
sufBx tree, it is the first on-line algorithm, though the computing bound 
of our algorithm is same as that of Kosaraju’s best-known algorithm. 
The results of computational experiments using actual RNA and DNA 
sequences are also given to demonstrate our algorithm’s practicality. 



1 Introduction 

The 3-D structure of a biological sequence plays a major role in determining 
its functions and properties, and sequences that have similar structures often 
have similar functions, even if the sequences themselves are not similar. But 
it is very difficult to predict the structure of a given sequence correctly and 
efficiently. Hence it seems to be still harder to find structurally similar regions 
among several biological sequences or to find a set of frequently appearing and 
structurally similar regions in a given sequence. Thus molecular biologists often 
search for only similar, or highly conserved regions from DNA, RNA or protein 
sequences to find regions with similar functions, because similar sequences have 
tendency of having the same structure. Though many such methods are very 
fast, they do not detect regions that are structurally similar to each other but 
not similar in the string sense. 

RNA sequences consist of four kinds of bases: A (adenine), U (uracil), C (cy- 
tosine), and G (guanine). Note that in DNA, T (thymine) is present instead of U. 
A and U (T for DNA) are said to be complements of each other, and C and G are 
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also complementary bases. RNA and single-stranded DNA sequences often form 
some structures by combining two complementary base pairs. It is known that 
double-stranded DNA sequences sometimes form such structures by becoming 
single-stranded locally. Note that a base sometimes combines with more than one 
complementary base: The triplex structure is the famous example. Many compu- 
tational studies have been done to predict RNA secondary structure, comparing 
a new sequence with a known RNA structure, searching a known RNA or DNA 
structures from large databases, and so on I2ll()ll2ll3lltil7ll9l2()l . But there 
has been no appropriate method that can mine an unknown important RNA 
structure from a large data set efficiently in a linear time, which is the aim of 
the algorithm presented in this paper. 



Let us consider the two RNA sequences in Figured (1). The two sequences are 
not at all similar to each other: there are no identical bases in identical positions. 
In sequence I, A’s are located at the 1st, 3rd, 8th, and 15th positions. In sequence 
2, C’s are located at the same position as A’s in sequence 1. Similarly, A’s, U’s, and 
G’s in sequence 2 are located at the same positions as G’s, C’s, and U’s in sequence 
1, respectively. Recall that A and U can combine with each other, and that C and 
G can also combine with each other. We then notice the following fact: If two 
bases in one of these sequences can combine with each other, then in the other 
sequence, two bases at in same two positions are also able to combine with each 
other. This implies that a structure that can be formed by one of the sequences 
can also be formed by the other sequence. Thus there is a strong possibility that 
these two sequences have the same structure, and consequently may have similar 
properties. For example. Figure d (2) shows one of the structures that can be 
formed by sequence 1. It is easy to see that it can also be formed by sequence 2. 



In this paper, we first introduce suffix trees and p-suffix trees as prelimi- 
naries. We also briefly describe Ukkonen’s algorithm, on which our algorithm 
is based. We then propose a new data structure called an s-suffix tree by gen- 
eralizing the p-suffix tree. We also discuss how to describe structural patterns 
of RNA or DNA here. Using the s-suffix tree, we can efficiently find some set 
of substrings in some given sequence that might be structurally simiar, query 
substrings that might be structurally similar to another given string, and so 
on. We also propose an efficient on-line algorithm for constructing an s-suffix 
tree based on Ukkonen’s algorithm. Finally, we give the results of computational 
experiments using several HIV RNA complete sequences and very large DNA 
sequences of E. coli (Escherichia coli). 



Sequence 1: AUAUCGUAUGGCCGAGCC 
Sequence 2: CGCGUAGCGAAUUACAUU 

(1) Example sequences 



complementary base pair 




(2) Candidate structure 



Fig. 1. Examples of sequences that have high possibility to have a same structure 
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2 Preliminaries 



2.1 SufRx Trees and p-SufRx Trees 



The suffix tree of a string S G if” is the compacted trie of all the suffixes of S$ 
($ ^ E) |9] 1 1 )l 1 51 1 SI2 1 j . This data structure is very useful for various problems 
in sequence pattern matching. Using it, we can query a substring of length m 
in 0(mlog\E\) time, we can find frequently appearing substrings in a given 
sequence in linear time, we can find a common substring of many sequences, 
also in linear time, and so on cm. 

The tree has n + 1 leaves, and each internal node has more than one child. 
Each edge is labeled with a non-empty substring of 5$, and no two edges out of 
a node can have labels that start with the same character. Each node is labeled 
with the concatenated string of edge labels on the path from the root to the 
node, and each leaf has a label that is a different suffix of 5”$. Because each 
edge label is represented by the first and the last indices of the corresponding 
substring in S'$, the data structure can be stored in 0{n) space. 

This data structure was first proposed by Weiner who gave an 0{n\E\) 
algorithm for constructing it, where n is the string length and lUj is the size of the 
alphabet. McCreight m improved it by giving an 0(n log IT’D algorithm. After 
that, Ukkonen m proposed an on-line 0(n log |i7|) algorithm, which processes 
a string character by character from left to right. Recently, Farach Q proposed 
an 0{n) algorithm for an integer alphabet {1, . . . , n}. 

A parameterized string, or a p-string for short, is a string over two alphabets 
E and 7T, where E is an ordinary alphabet and 7T is a set of parameters. Two 
p-strings are said to match if they are same except for a one-to-one correspon- 
dence between the characters in U occurring in them. For example, two p-strings 
kCxBCyzykzxC and kCyBCzxzkxyC match {E — {A,B, C} and II = {x,y,z}). 

As in we define prev(S') for any p-string S as follows: 



Definition 1. Let N be the set of nonnegative integers. For any parameter x G 
n in string S G {EVJ II)* , replaee it by an integer in N that equals the number 
of positions between it and the nearest x to the left, except for the leftmost x, 
which is replaced by 0. We let the obtained string in (EU N)* be prev(S). 

For example, prev(ACxBC 2 / 22 /A 2 ;a;C) = AC0BC002A38C. The p-suffix tree of a 
p-string S is the compacted trie for all prev(suffixi(5'))) for all positions i, where 
sufiixi(S') denotes a suffix of S that starts at position i. Baker [t-)|5fti] proposed 
this data structure and showed that it can be constructed in 0(n(|7T| -l-log lUj)) 
time. Kosaraju PI improved the time by giving an 0(n(log |7T| -|- log |if|)) al- 
gorithm. Note that both of the algorithms are based on McCreight’s suffix tree 
construction algorithm PI and that neither supports on-line computation. This 
paper will give an on-line algorithm for the same task, based on Ukkonen’s al- 
gorithm HE|. 

In the following sections, we use the following definitions. In a suffix tree, let 
parentfu) be the parent node of node u, let Gu be the string label of node u, and 
let node{a) be node u in the tree such that cr„ = a if it exists. The suffix link 
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of u is a link to a node with label a if u is not the root, and has a label of ca, 
where c is a single character. It is known that a suffix link always exists for any 
u except for the root in a suffix tree |1 (11 511 H] . If u is the root we let its suffix 
link be u itself. Let sl{u) be the suffix link of u. 

2.2 Ukkonen’s SufRx Tree Construction Algorithm 

In this section, we briefly describe Ukkonen’s suffix tree construction algorithm. 

The implicit suffix tree of S is the compacted trie of all the suffixes of S, and 
a label for an edge that ends at a leaf is represented by only the first index of the 
label. Let prefixi(S') be a prefix of S whose length is i, let Ti denote the implicit 
suffix tree of prefixi(S'$), and let n = Ukkonen’s algorithm consists of n + 1 
phases, and in the ith phase, we construct an implicit suffix tree Ti from Ti-\. 

In the ith phase, we construct a new node u = no<ie(suffixj(prefixj(5'))) for 
all 1 < j < * in this order if there is no locus for suffix^ (prefixi(5')) in the tree. 
When we construct u, if there is no node with a label of suffixj_i(prefixi(5')), 
we must also construct a new internal node at the appropriate locus and let it 
be the parent of u. We call this procedure for single j the jth extension of the 
ith phase. 

Notice that we do not have to construct node u = node(suffixj(prefixi(5'))) if 
V = node(suffix_,(prefixi_i(S'))) was a leaf in the previous phase, because of the 
definition of the implicit suffix tree: cr„ is suffixj(prefixi(5')) in this phase. Thus, 
if there is a leaf for each of node(suffixj(prefixi_i(S'))) for all j < k in phase 
i — 1, we can begin by constructing node(suffixj_|_i(prefixi(S'))) in this phase. 
Furthermore, if there is a locus for suffixj(prefixi(S')) for some j, it is easy to see 
that there already exist loci for suffixfc(prefixi(S')) (fc > j) too, and that there is 
no need to construct nodes for them in this phase. 

Ukkonen’s algorithm, like McCreight’s algorithm, maintains at each node u of 
the suffix tree a suffix link sl{u). In any phase, we construct nodes 
Uj = noc?e(suffixj(prefixi(S'))) for several consecutive j’s and m' = 
node(suffixj(prefixi_i(S'))) if necessary, in the manner described above. Notice 
that Wj+i = sl{uj) and if they exists. For the last Uj to be con- 

structed in this phase, we will check the locus for sufhxj_|_i(prefixi(S')), which 
is sl(uj) in the next extension according to the algorithm. Thus we will know 
within the phase the suffix links of all the constructed nodes in the same phase. 
In this way, we can maintain the suffix links. 

Using the suffix links, we can construct node Uj = noc?e(suffixj(prefixi(S'))) 
faster: It is easy to see that sl{parent{uj -i)) must be an ancestor of Uj, and we 
can find the locus of suffixj(prefixi_i(S')) by tracing edges from sl{parent{uj-i)) . 
We call tracing from the suffix link to the target locus “scanning.” 

In this way, the algorithm achieves an 0(n log |U|) time complexity. For more 
details of the algorithm and the analysis of the computing time bound, see nm 
or PHI- 
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3 Structural SufRx Tree 

3.1 s-Strings and s-SufRx Trees 

In this section we define s-strings and s-suffix trees, which are generalizations of 
p-strings and p-sufhx trees. 

Definition 2. Let S and II be disjoint finite alphabets. We call the characters 
in S the “fixed symbols” and those in II the “parameters.” Some of the char- 
acters in n have one-to-one correspondences to other characters in II: No two 
characters can be complements of one character, and two characters that corre- 
spond to each other are called complementary characters. A string in (ifU 77)* 
is called a structural string, or s-string for short. Two s-strings S and S' are 
said to s-match if they satisfy the following two conditions: (1) there exists a 
one-to-one mapping from II to II such that S becomes S' as a result of applying 
it, and (2) if x is mapped to y in the mapping, then the complement of x is also 
mapped to the complement of y in the mapping. 

For example, if N = {A,B}, 77 = {x, y, z, w}, and x and y are complements of 
z and w, respectively, then ABxByAzwz and ABwBxAyzy s-match, but ABxByAzwz 
and ABwBxAzyz do not. Note that if there are no complementary pairs in 77, an 
s-string is the same as a p-string. Note also that the complement of a given 
character can be accessed in 0(log |77|) time if the information is stored in a 
balanced tree data structure, which can be constructed in 0(|77| log |77|) time. If 
77 can be used as an index to a table, the complement can be obtained in 0(1) 
time. 

The problem of the RNA (or DNA) structural matching described in section^ 
is the problem of s-matching in the following situation: E = II = {A, U, C, 
G}, and A and C are complementary characters of U and G, respectively. If two 
RNA sequences s-match with each other, it can be said that there is a high 
possibility that the two sequences have the same structure and that they may 
have similar properties as a result. For example, the two sequences in Figure ^ 
(1) s-match. 

The following two encodings are useful for determining s-matching of two 
sequences. One is prev(S') that is already defined in Definition ^ The other is 
compl(S') defined as follows: 

Definition 3. Let N be the set of nonnegative integers (N ^ E U II). For any 
parameter X & II in string S € (AU77)*, replace it by an integer in N that equals 
the number of positions between it and the nearest complementary character of 
X to the left. If there is no complementary character to the left, replace it by 0. 
Let compl(S) denote the obtained string in (AUTV)*. 

For example, compl(ABa;ByAzr(;z) = AB0B0A436 if A = {A,B}, 77 = {x, y, z, w}, 
and X and y are complements of z and w, respectively. We can compute prev and 
compl encodings for string S of size n in 0(n •min(log n, log | A|)) time and 0{n) 
space by means of a balanced tree structure, which can be computed on-line. If 
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n is known and can be used as an index to a table of size |i7|, it is easy to see 
that these encodings can be computed in 0{n + |7T|) time and space. 

These two encodings are related to finding s-matches as follows: s-strings 
S and S' are an s-match if and only if prev(S') = prev(S") and compl(S') = 
compl(S"). Furthermore, it is easy to see the following lemma. Let prev(S')[i] 
and compl(S')[z] denote the zth characters of prev(S') and compl(S') respectively. 

Lemma 1. Consider a situation in which prefixi(S) and prefixi(S') are an s- 
match. In this situation, if prev(S)fi + l] = prev(S' )[i + l] ^ 0, compl(S)[i+l] = 
compl(S' )[i+lj. Similarly, if compl(S)[i+l] = compl(S' )[i+l] ^ 0, prev(S)[i+l] 
= prev(S' )[i +1/. 

This means that, when we check s-matches of strings, we do not have to 
see the other encoding if one of the encodings encodes a character as a non- 
zero number. Using this lemma, we can check s-matching by using the following 
s-encoding: 

Definition 4. For a given string S, compute prev(S ) and compl(S). Ifprev(S )[i] 
= 0, replace it with —compl(S )[i], which is a nonpositive value. We call this new 
encoded string in {S U I)* (I: integer) as a structural encoding of S, or an 
s-encoding for short. 

The structural suffix tree of string S, or the s-suffix tree of S for short, 
is the compacted trie of the s-encoded strings of all the suffixes of S. Let 
sencode(S') denote the s-encoding of S. The s-strings S and S' are an s-match 
if and only if sencode(S') = sencode(S"). Let ssuffixi(5') = sencode(suffixi(S')), 
and let ssufixi(S')[j] be the jth character of ssuffixi(S'). Notice that ssufixi(S')[j] 
is sometimes different from sencode(S') [i-\- j — 1] : if |sencode(S') [z -I- j — 1] | > j, 
ssufixi(S')[j] = 0 yf |sencode(S')[z -I- j — 1]| > j. Notice also that, if we have 
prev(S') and compl(S'), we can obtain the value of ssufixi(S')[j] for any z and j 
in a constant time. 

Using the s-suffix tree, we can find some set of substrings in some given 
sequence that s-match each other in 0(n) time or query a substring that matches 
another given string in 0(rzlog(|U| -|- |7T|)) time. We can also find common s- 
substrings of given two sequences in a linear time. 

3.2 Basic Algorithm 

We first describe a basic method for constructing the s-suffix tree based on 
Ukkonen’s algorithm. 

The implicit s-suffix tree of S is the compacted trie of all the s-encoded 
suffixes of S, and a label for an edge that ends at a leaf is represented by 
only the first index of the label in it. Let denote the implicit s-suffix tree 
of prefixi(S'$) for the given string S and an integer 0 < z < rz -I- 1 where n = [S'!. 
Let node{S) denote the node with label of s-encoded string of S in this section. 
Like Ukkonen’s algorithm, our basic algorithm consists of rz -I- 1 phases, and in 
the zth phase, we construct an implicit s-suffix tree Tz from Tz_i. 
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As in Ukkonen’s algorithm, we construct a new node u = 
node(suffixj(prefixi(S'))) for all 1 < j < i in this order if there is no locus 
for suffixj(prefixi(S')) in the tree in the ith phase. We call this procedure for 
single j the jth extension as in the description of Ukkonen’s algorithm. Ukko- 
nen’s algorithm speeds up each phase by ignoring unnecessary extensions. In 
this s-suffix tree case, the unnecessary extensions are all the same as Ukkonen’s 
case. 

The major problem in constructing s-suffix trees by applying Ukkonen’s al- 
gorithm is that, as in Baker’s p-suffix tree construction algorithm, a node of an 
s-suffix tree does not always have explicit suffix links to another node. Consider 
a node u = node{ca) in an s-suffix tree, where c is a single character and a is 
some s-encoded s-string. It is possible that the locus for a is not a node but a 
point on an edge. In this case, we let u’s suffix link sl{u) be this edge and call 
such a link an implicit suffix link. 

The implicit suffix links causes two problems. One is how to keep these im- 
plicit suffix links correct thorough the algorithm: the implicit suffix links must be 
updated if the corresponding edge is split. The other is how to analyze the num- 
ber of scanned nodes in the algorithm. First, we deal with the former problem, 
and after that we discuss the latter problem. 

It is easy to see the following lemma related to implicit suffix links: 

Lemma 2. Let u be a node with an implicit suffix link and d = |ct„|. Then the 
first s-encoded character of the label of any of the outgoing edges from u must be 
one of d, 0, and —d. Furthermore, if it is d, its corresponding compl value must 
be 0. 

We use the term ‘zero-node’ for a node with more than one outgoing edge that 
has a label starting with either of d, 0, or —d, where d is the label length of 
the node, regardless of whether its suffix link is implicit or not. We also call 
edges the first s-encoded character of whose labels are d, 0 and — d, a “positive 
zero-edge”, a “normal zero-edge” and a “negative zero-edge,” respectively. 

The following lemmas related to zero-nodes and zero-edges can be easily seen: 



Lemma 3. On a path from the root to a leaf, there are at most \II\ zero-nodes. 

Lemma 4. A positive zero-edge cannot be an ancestor of another positive zero- 
edge. Similarly, a negative zero-edge cannot be an ancestor of another negative 
zero-edge. 

It is easy to find the implicit suffix links of newly constructed nodes in the 
algorithm, as in Ukkonen’s algorithm. Consider the situation in the jth exten- 
sion of the ith phase of the algorithm, when Uj = node(suffixj(prefixi(S'))) is 
constructed. Note that m' = node(suffixj(prefixi_i(S'))) may be constructed at 
the same time if necessary. As in the case of constructing an ordinary suffix tree, 
we will check the locus of suffixj+i(prefixi_i(S')) in the next extension in the 
same phase, so we will soon find the suffix links of Uj and Uj. Hence we can 
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conclude that every node other than the last leaf inserted and its parent has 
either an ordinary suffix link or an implicit suffix link. The problem is how to 
maintain these implicit suffix links. In the algorithm, we often split edges to add 
a new node. Thus we have to update each implicit link if the edge it links to is 
split into two edges by inserting a new node. We call a set of nodes a zero-chain 
if the nodes form a chain in the tree and all the edges between them are the 
same kind zero-edges {i.e., if one edge is a positive zero-edge, then the others 
are also positive zero-edges, for example). 

We obtain the following theorem related to implicit suffix links: 

Theorem 1. For any edge e, the set of nodes having implicit suffix links to e 
forms at most 2|7T| -|- 1 zero-chains in the tree. Furthermore, the length of each 
zero-chain is at most |il|. 

Proof. Let v and v' be two nodes with implicit suffix links to the same edge. If 
V is an ancestor of v' and there is a node u between v and v' , it is obvious that 
sl{u) is also between sl{v) and sl{v'). 

If neither of these two nodes is an ancestor of the other, let w be the lowest 
common ancestor of v and v' in the suffix tree. Note that w is not the root, 
because both of the first s-encoded characters of the labels of v and v' must be 
0. Since one of the s-encoded strings of suffix 2 (cri,) and suffix 2 (cr„') must be a 
prefix of the other, the outgoing edges to v and v' must be zero-edges. Thus w 
must be a zero-node. 

Lemma 0 implies that, under the negative or positive zero-edge out of w, 
there is only one zero-chain formed by the set of nodes having implicit suffix 
links to e. Furthermore, a normal zero-edge can have at most |7T| — 1 normal 
zero-edges on a path to some leaf from it, according to Lemma |3 Thus we can 
conclude that there are at most 2|7T|-|-1 zero-chains. Also according to Lemma 0 
it is obvious that the lengths of the zero-chains are at most |il|. 

Note that, in the case of the p-suffix tree {i.e., when there are no comple- 
mentary character pairs), such nodes form only one zero-chain. According to this 
theorem, there are at most 2|ilp -|- |7T| implicit suffix links to one edge. Hence 
when we split an edge, it takes 0(|ilp) time to update all the corresponding 
implicit suffix links if we do it naively. If |7T| is constant, the bound is 0{n), but 
if |7T| is large, it causes a problem. From now on, we consider how to reduce the 
time to 0(log \II\). 

Consider 0{n) sets of nodes which are empty at first. We perform two types 
of procedures for the sets. One is inserting a node into one of the sets, and the 
other is splitting one of the sets into two sets according to the label lengths of the 
nodes in the set, as follows: One of the two sets newly constructed by splitting is 
the set of nodes whose label lengths are larger than a specified length, and the 
other is the set of nodes whose label lengths are smaller than the same specified 
length. Consider that the upper bound of the size of a set is p, and that the 
total number of nodes inserted is 0{n). A balanced data structure can achieve 
the following time bounds for these two procedures: 
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1. A new item can be inserted into a set in O(logp) time. 

2. S can be split into two sets as above in a time linear to the size of the smaller 

set of them. 

It is easy to see that the total time taken by procedure 1 is 0{nlogp), and 
that the total time taken by procedure 2 is also 0{nlogp). We can maintain the 
implicit suffix links by this procedure. In this case, p = 0(|7Tp); thus the total 
time required for maintaining them is 0(n log \n\). 

From now on, we discuss the number of nodes scanned in the algorithm. 
Note that it is 0{n) in Ukkonen’s algorithm [II 0|1 Sj . Note also that the analysis 
for it is almost the same as that of the number of rescanned nodes in Baker’s 
algorithm for p-suffix trees m- 

Theorem 2. The number of scanned edges that are not normal zero-edges is at 
most n. 

Proof. In constructing the s-suflix tree of a string S, consider that an edge 
{u, v) that is not a normal zero-edge is scanned when we search for the lo- 
cus of suffixj(prefixi(S')) in the jth extension of the {i l)th phase. Let u = 
node(suffixj (prefixfc_ i (S') ) ) ) . 

Consider the locus of suffixj/(prefixfe_i(S)))) for any / < j. Note that we 
do not perform the j"th (j" > j) extension in the i'th. phase {%' < i) of the 
algorithm. If there exists a node (w) for the locus, then it must have an explicit 
suffix link to some node, because suffixj/(prefixfe(S))[fc — j' 1] is not 0. This 
means that w cannot be scanned in the algorithm. Accordingly, we conclude that 
the number of such edges is at most n. 

According to LemmaEl the number of normal zero-edges that are scanned in 
a single phase is at most |7T|. Thus the total number of nodes that have implicit 
suffix links and are scanned in the algorithm is at most n|il|. Note that the 
outgoing normal zero-edge from a node can be accessed in 0(1) time. Thus the 
total scanning time will be 0(n(|7T| -I- log IT’D). 

Thus we conclude that the total computing time of our algorithm is 0(n(|7T|-|- 
log|A|)). If |7T| and |A| are constant, it is 0(n). In fact, in the problem of 
RNA/DNA structural matching (|A| = 0 and |77| = 4), this basic algorithm is 
efficient enough. 



3.3 Faster Algorithm when \S\ — 2 

In this section, we will improve the algorithm given in the previous section to 
0(n log 1 7T|) when |A| = 2. The technique used in this section is almost the 
same as Kosaraju’s technique P) for improving of the rescanning procedure of 
Baker’s algorithm. 

In each extension, if we insert a new node into an edge (u,v), we will scan 
from sl(u) to find the locus of the suffix link of the new node. We want to reduce 
the number of zero-edges encountered in this scanning. Let Z^v be the set of 



402 T. Shibuya 



normal zero-edges whose starting node is encountered in scanning from sl(u) to 
sl{v). Note that < |77|. 

For each edge, we maintain a concatenable queue [mi, or c-queue for short. 
Each c-queue for (u, v) contains a set of edges in arranged in the order of 
their depth. We maintain the c-queues lazily; that is, we put edges into c-queues 
only when we first encounter the edges in scanning. Thus the c-queue for edge 
(u,v) does not contain all of the edges in Zuv The same edge can appear only 
in 2 c-queues because |27| = 2. 

The time taken to insert an edge into a c-queue is 0(log|i7|). In scanning 
for the locus of depth d, we begin from the deepest edge in the corresponding 
c-queue whose starting depth is shallower than d. Such an edge can be found in 
0(log |7T|) time. Furthermore, we must split the c-queue when the edge is split, 
and this can be done in 0(log |7T|) time. 

In this way, we can achieve an 0(n log |7T|) time algorithm if lifl is a small 
constant. Note that the space complexity is 0(n). From now on, we consider 
how to achieve 0(n(log |7T| -|- log IT'D) time. 

3.4 Faster Algorithm for Arbitrary S 

In this section, we will improve the algorithm given in the previous sections 
to 0(n(log |i7| + log |fl|)), which is far more efficient for larger alphabets. The 
technique used in this section is also almost the same as Kosaraju’s technique 
HI for improving the rescanning procedure of Baker’s algorithm, except that 
our algorithm is on-line. 

For any given string S, we can construct two strings S\ and S 2 as follows: 
Si is S with every parameter replaced by integer 0. S 2 is S with every fixed 
symbol replaced by a single fixed symbol. We can construct the implicit suffix 
tree of prefixi(5'i) by Ukkonen’s algorithm and the s-suffix tree of prefixi(iS' 2 ) by 
the above algorithm while constructing the implicit s-suffix tree for prefixi(S' 2 ). 
Note that the construction of the suffix tree of Si takes 0(n log IT’D time and 
that of the s-sufhx tree of S 2 takes 0(nlog |7T|) time. 

In any phase i, we can compute the length of the common prefix of s-encoded 
strings of suffixj(prefixi(S')) and suffixfc(prefixi(S')) for any k < i and j < i as 
follows: According to 0, we can compute the lowest common ancestor of two 
nodes of a suffix tree in a constant time even while we are constructing the 
tree. Thus we can compute in a constant time the length of the common prefix 
of suffix^ (prefix, (^i)) and suffixfc(prefixi(5'i)) for any k < i and j < i. We 
can also compute in a constant time the length of the common prefix of the 
s-encoded suffixj(prefixi(S' 2 )) and suffixfc(prefixi(S' 2 )) for any k < i and j < i. 
The length of the common prefix of s-encoded strings of suffixj(prefixi(5')) and 
suffixfe(prefixi(S')) for any k < i and j < i is the smaller of these two values. 

By maintaining a set of edges that forms a zero-chain in a c-queue, we can 
speed up the scanning as follows: Consider a situation in which we encounter a 
zero-chain while scanning. First we find a leaf w that is a child of the deepest 
node in the zero-chain and is not a child of the target locus. Next, we compute 
the length of the common prefix length of the s-encoded strings of the the target 
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string and aw Then we can find the deepest edge in the zero-chain that is an 
ancestor of the locus in 0(log |7T|) time. An normal zero-edge from a node can be 
accessed in a constant time. In this way, we can achieve an 0(n(log | A|-|-log \II\)) 
computation time. For more details of this speeding-up technique, see m 

4 Computational Experiments 

Using the s-suffix tree of a string we can perform tasks such as the following: 

— Given a long sequence of length n and some constant I and r, we can find a 
set of more than r substrings that s-match with each other and are longer 
than I in an 0(n(log | A|-|-log \n\) + Toutput) time, where Toutput is the output 
size. 

— Given more than one sequence, we can find the longest common s-encoded 
pattern of these sequences in 0(n(log 1171 -I- log |77|) -I- Toutput) time, where 
Toutput is the output size and n is the sum of the lengths of the input se- 
quences. 

Note that, if the size of the alphabet is constant, both of these tasks can be 
completed in a linear time. In this section, we describe experiments on RNA and 
DNA sequences, in which we constructed the s-suffix tree of DNA sequences, 
where S = (f>, II = {A,U,G,C}, A is the complement of U and G is the complement 
of C. (In DNA sequences, T is present instead of U.) 

We conducted experiments on three HIV (human immunodeficiency virus) 
RNA complete sequences: (A) a sequence of length 9719 (accession number: 
K03455), (B) a sequence of length 9748 (accession number: X01762) and (G) a 
sequence of length 8981 (accession number: AF067156). We also use four very 
long DNA sequences of E. coli, each of which has the same length, 1 Mbp = 
1,000,000 bp. The length of the full genome sequence of E. coli is about 4.64 
Mbp, and these four sequences are the following regions of the sequence: (D) 1 
bp-1,000,000 bp, (E) 1,000,001 bp-2,000,000 bp (F) 2,000,001 bp-3,000,000 
bp, and (G) 3,000,001 bp-4,000,000 bp. 

First, we compare the size of the s-suffix tree with that of the normal suffix 
tree of the same sequences. Table ^ shows the numbers of nodes in the suffix 
trees and the s-suffix trees of the seven sequences. According to the table, the 
sizes of the s-suffix trees are slightly smaller than those of the normal suffix trees 
in all cases, but the numbers of nodes in them are almost the same regardless of 
the length of the sequence. For any sequence, both the number of nodes in the 
suffix tree and that of the s-suffix tree are about 1.6 to 1.7 times the length of 
the sequence. Thus we can say that the s-suffix trees are very compact and that 
it is as reasonable to build them as to build the normal suffix trees. 

Gonsider that a structural pattern a of length I appears r times, but any 
pattern that is constructed by extending it to the right, such as ac (c € (U U 
n)) appears less than r times. We call such a pattern a a “maximal structural 
pattern.” We now give the experimental results of an experiment to find maximal 
structural patterns which are longer than I and repeated more than r times for 
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Table 1. Number of nodes in suffix trees and s-sufiix trees 



Sequence 


(A) (B) (C) (D) (E) (F) (G) 


Length 


9719 9748 8981 1000000 1000000 1000000 1000000 


SufSx Tree 
s-SufSx Tree 


16135 16217 14710 1640492 1635995 1638043 1638008 
16033 16132 14666 1631525 1628821 1630104 1628923 



Table 2. Examples of maximal structural pattern 



(1) 


Position 


Sequence 


646095 


CCCGCTTCGGCTTCA 


703617 


GGGCGTTGCCGTTGA 


779110 


TTTATGGTAATGGTC 


888469 


TTTATCCTAATCCTG 



(2) 



Position 


Sequence 


371484 

884639 


ACTGCGCCATGAAGATGAC 

GACTATAAGCTGGTGCTGA 



some given I and r. Table El shows two examples of maximal structural patterns 
found in E. coli sequence (D): (1) is a set of patterns of length 15 that appears 
four times, and (2) is a set of patterns of length 19 that appears 2 times in 
the sequence. Every sequence is different from the others, but these sequences 
s-match with each other. 

Table Olshows the number of maximal patterns whose lengths {I’s) are larger 
than some given length. In the table, a “normal pattern” means an ordinary 
string pattern that can be found with an ordinary suffix tree. Notice that the 
structural patterns includes the normal patterns. According to the table, we can 
see interesting facts such as that the proportion of normal patterns increases 
with the lengths of the patterns. 

5 Concluding Remarks 

We have proposed a new data structure called the structural suffix tree, or s-sufffx 
tree for short. We also proposed an on-line 0(n(log lAI -blog |il|)) algorithm for 
constructing it, where E is an alphabet of fixed symbols and II is an alphabet of 
parameters. This data structure enables an efficient search for frequent patterns 
of structures of RNA sequences or single-stranded DNA sequences. It also enables 
a common structure pattern to be efficiently found in more than one sequence. 
We also showed the practicality of our data structure and our algorithm by 
reporting computational experiments for finding structural patterns from RNA 
sequences of HIV and DNA sequences of E. coli using the s-suffix tree. 

Several tasks remain for the future. Two sequences can have the same struc- 
ture even if they do not have the same s-encoded string patterns. Furthermore, 
it is difficult to apply our algorithm to the problem of proteins, where the combi- 
nations are far more complicated. Thus we should strive to create more general 



Generalization of a Suffix Tree for RNA Structural Pattern Matching 405 



Table 3. Number of structural/normal patterns 



(1) HIV RNA sequences (2) E. coli sequences 



1 


Pattern 


(A) (B) (C) 


> 5 


Structural 

Normal 


5329 5061 4887 
1381 1147 1000 


> 10 


Structural 

Normal 


670 451 282 
479 363 126 


> 15 


Structural 

Normal 


336 123 4 

336 123 3 



1 


Pattern 


(D) (E) (F) (G) 


> 10 


Structural 

Normal 


495371 499205 498728 497701 
90968 85899 88681 90298 


> 15 


Structural 

Normal 


4723 4140 4466 4529 

2402 1728 2095 2147 


> 20 


Structural 

Normal 


330 106 192 192 

330 103 192 190 



data structures and algorithms for structural pattern matching of biological se- 
quences. Recently, Farach introduced a linear-time suffix tree construction 
algorithm for strings of an integer alphabet {1, . . . ,n}. It is an open problem 
whether or not such a linear time algorithm exists for constructing s-suffix trees 
or p-suffix trees. 
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Abstract. Many efficient algorithms have been developed to compute 
the length of a longest common subsequence (LCS) between two strings. 
In general, an LCS is not unique but current methods only recover a 
single LCS. We investigate the problem of finding all longest common 
subsequences. A simple extension of the reconstruction method used by 
existing algorithms would seriously harm their time complexities. We 
present observations on a symmetry of the LCS problem which allow us 
to develop a general method to obtain a representation of all longest 
common subsequences while preserving the favorable time bounds of 
known algorithms. 



1 Introduction 

The problem of determining the similarity of two sequences A = aitt 2 ■ ■ ■ Um and 
B — b\b 2 ■ ■ - bn, m < n over some finite alphabet E arises in many different areas 
of application, including one in the study of the evolution of long molecules. A 
widely accepted measure of similarity is the Levenshtein distance which can be 
evaluated by a dynamic programming algorithm in time 0{mn) 1 1 Yj . A common 
subsequence is any sequence which can be obtained from both strings A and 
B by deleting zero or more (not necessarily adjacent) symbols. The length p of 
a longest common subsequence is closely related to the Levenshtein distance. 
In fact, it can be viewed as special case and thus a variation of the dynamic 
programming algorithm can be used to compute this value. 

It was observed early that efficiency could be improved for typical applica- 
tions by exploiting some structural properties of the LCS problem jOl E! • The 
time complexities of such algorithms are parameterized by variables other than 
the sizes of the two input strings (for surveys see |31 El])- For example, there 
are algorithms which perform well when the length of an LCS is short {0{pm) 
while preference would be given to an 0{n(rn — p)) algorithm j I ,'i) when 
the length of an LCS is long. In Section |2| we will shortly review the paradigm 
underlying most of these algorithms. The primary goal of these methods is to 
calculate the length of an LCS as quick as possible. 

In some applications, e.g. when building an alignment of two DNA sequences, 
one is also interested to see in which places the two sequences differ and where 
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they match. Alternatively, an edit script may be required which transforms A 
into B by insertions and deletions. All methods can easily be extended to recover 
a single longest common subsequence and this does not affect the asymptotic 
time complexity of the algorithms. However, a longest common subsequence 
is not unique and the (arbitrary) LCS reconstructed is primarily due to the 
implementation of an particular algorithm. Thus it may not reveal an expected 
relationship between the two sequences which another LCS would be able to 
indicate. Therefore it is of interest to compute a representation which highlights 
the structure of all longest common subsequences and from which each single 
LCS may be recovered easily. So far, this may only be achieved by the dynamic 
programming algorithm and (partially) by the construction proposed in | 2 ] . 

As we will argue in Section E| simple extension of the classical method (trace 
back) to recover an LCS used by all the efficient algorithms will seriously harm 
their time complexities. In Section^ we give a characterization of LCSs which is 
based on a symmetry of the LCS problem and in Section 0 we show how to use 
it to maintain the original time complexities of the algorithms while computing 
a suitable representation of all longest common subsequences. 

2 Matches and Contours 

It is common to describe the LCS problem in the following way. An ordered pair 
ihj), 1 < i < TO, 1 < j < n is called a match if ai = hj. The set M of all 
matches can be represented by a matching matrix of size to x n in which each 
match is identified by a circle. Two matches (i, j) and {i' ,j') may be part of the 
same common subsequence if and only if j < z' A j < j' or i' < i A j' < j . A 
sequence S' C M of matches that is strictly increasing in both components is 
called a chain. The LCS problem can now be viewed as finding a longest chain. 
It is solved by employing a technique called sparse dynamic programming |7| 
which rests on some structural properties of the LCS problem. 

Let \LCS{A,B)\ denote the length of an LCS between strings A and B and 
let Ai = oi . . . Oi, 0 < i < TO denote the length i prefix of A. For a match {i,j) 
we say that it is of rank k if the length of a longest chain ending at (z, j) is k. 
We can collect matches of the same rank k in classes 

Ck = {{i,j)GM : \LCS{A,,B,)\ = k}. 

Thus, M can be partitioned into classes Ci, C 2 , . . . , Cp, each class containing 
matches of the same rank. It is well known that these classes exhibit a special 
structure in the matching matrix. If sorted in increasing order with respect to the 
first component and in decreasing order with respect to the second component, 
matches belonging to the same class shift from right to left, and they form so- 
called contours when connected by lines as shown in Figure 0(a). Contours of 
different classes may never cross or touch, and the contour of each class divides 
the matrix into a top/left part and a bottom/right part. Each contour can be 
completely specified by dominant matches, i.e. those matches (i,j) in a class for 
which there is no other match {i' ,f) in the same class with i' = i A f < j or 
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Fig. 1. Matches and Contours (a), LCSs-graph (b). 



i' < it\j' = j. We use C Ck to denote the dominant matches of rank k. They 
are located in the top/left corners of a contour and they are indicated by bold 
circles in Figure ^a) . On the other hand each match in Ck \ Dk is dominated by 
some match in Dk- Thus there will always be an LCS consisting only of dominant 
matches. For more background on this see for example HH. 

Speedup was gained for many algorithms by concentrating on the computa- 
tion of dominant matches and by exploiting the above properties in clever ways 
|T^ im Q 151 IT 3 ). For example, the algorithm introduced in uni has time com- 
plexity 0{n\S\+TiiiTi{pm,p(n—p)'\) where time 0(n\S\) is used for some prepro- 
cessing (solving the so-called string identification problem) and time 0(min{pm, 
p(n — p)}) is used to determine the dominant matches. Our goal will be to main- 
tain such favorable time complexities while computing a representation of all 
LCSs. To this end it is important to note that the time complexity of the main 
processing stage is an upper bound on the number of dominant matches since 
each dominant match is touched at least once by these algorithms P|. On the 
other hand it is known that p < d < r < mn. In particular there is no fixed 
correlation between the total number of matches, r, and the number of dominant 
matches, d. There are instances where the latter can be very small while the for- 
mer essentially remains quadratic. E.g., consider strings over a small alphabet 
where each symbol occurs the same number of times and which have an LCS 
close to the string length. 

3 Representing All LCSs 

A single LCS may be recovered by storing a pointer with each dominant match 
which points to one of his direct predecessors in a chain. Starting with any 
dominant match of highest rank we can trace an LCS in time 0{p) by simply 
following these pointers. This is the classical way to reconstruct an optimal 
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solution in dynamic programming methods. But in general an LCS is not unique. 
Consider the two strings A = abacbcba and B = cbabbacac. Their structure is 
shown in Figure ^a). There are five different character sequences which all form 
an LCS, namely abacc, abaca, abbca, babca, and babba. Further, for each character 
sequence there may exist different embeddings, i.e. positions in the two strings to 
which the characters of an LCS map. E.g., the sequence abacc may correspond to 
the sequence of matches (1,3), (2,4), (3,6), (4,7), (6,9) or to (1,3), (2,5), (3,6), 
(4, 7), (6, 9). A canonical embedding of a fixed LCS is an embedding where each 
character, starting from the beginning of the LCS, is assigned matching positions 
in both sequences as small as possible. So the first sequence of matches above is 
a canonical embedding while the second is not. 

In PI a directed acyclic subsequence graph (DASG) is defined as a finite 
automaton recognizing all subsequences of a string. The generalization for two 
strings is a representation of the canonical embeddings of all LCSs and can be 
built in time 0(n log n + r). The only representation which includes all embed- 
dings of all LCSs is the dynamic programming matrix augmented with appro- 
priate back-pointers which takes time and space 0(mn) to construct. 

For each match {i,j) define CS(i,j) to be the maximum length of a common 
subsequence of A and B containing {i,j), i.e. 

CS{i,j) := max{|S'| : S' is a chain and (i,j) S S}. 

We will now define the LCSs-Graph which seems to be the most appropriate 
structure to represent all LCSs, including different embeddings. 

Definition 1 (LCSs-Graph). The LCSs-Graph of two Sequences A and B 
which have LCSs of length p is the directed acyclic Graph G = (V,E), where 

1 . V = V1UV2O . . .OVp, 

2- Vk = {(i,j) I (i,j) €Ck/\CS{i,j) =p}, 

3. E = {[(i, j), {i',j')] I (i,j) e Vfc, {i',f) £ 14+1,* < i',j < j'}- 

The LCSs-graph for our example of Figure^a) is given in FigureEKb). Note that 
edges are easily derived from the nodes and so they need not be given explicitly. 
Thus, our primary concern will be the efficient computation of nodes. The goal 
is to construct G in time 0{T + |G|) where T is the time of any algorithm which 
determines the dominant matches. 

An important fact to note is that in general dominant matches are not suffi- 
cient to represent all LCSs. It is easily checked that the LCS abbca, indicated by 
back-arrows in Figure Q(a), can not be represented solely by dominant matches. 
On the other hand we don’t (yet) have a criterion to decide which non-dominant 
matches are necessary for the construction of some LCS and which are not. 

Therefore, if we decide to reconstruct all LCSs using the classical back- 
pointer-approach there seems to be no way to avoid the creation of a separate 
node for every match. Further, with each match we may have to store several 
back-pointers. So this method will take time G{r). Such a time bound is disas- 
trous for all algorithms concentrating on the computation of dominant matches. 
Their time complexity usually gives an upper bound on the number of dominant 
matches but not on the total number of matches (see Section Ej) . 



Efficient Computation of All Longest Common Subsequences 411 



4 A Characterization Based on Strnctnral Symmetries 

Our goal in this section is to establish a characterization of the nodes of the LCSs- 
graph which allows us to keep our attention restricted to dominant matches. This 
will support the development of an efficient construction to be given in the next 
section. 

We exploit a symmetry of the LCS-problem which already has been used 
fruitfully in connection with linear space computations jH]- Usually, the input 
strings are considered in the typical reading direction from left to right (or front 
to rear). This, however, is an arbitrary decision and in fact any algorithm can 
be easily modified to work in the opposite direction (equivalently, the original 
algorithm may be applied to the reversed input strings). We will note some 
general facts on the relationship of contours computed in the usual way, called 
forward contours, and those computed by considering the strings in the opposite 
direction, called backward contours. Although the number of contours is identical 
we note that forward contours and backward contours might look very different 
(see Fig. QJa) and Fig. Ela)). In particular dominant matches on forward con- 
tours need not be dominant matches on backward contours and vice versa. Since 
contours partition the set of matches there is a unique forward contour and a 
unique backward contour for each match. As before we use Ck and Dk to de- 
note matches and dominant matches on the fc-th forward contour, respectively. 
Likewise 

Ck = {{i,j)GM : \LCS{A\B^)\ = k} 

is the set of matches on backward contour k where A* = . . . Om , 1 < i < m + 1 

denotes the length m — i + 1 suffix of A. The set Dk C Ck of dominant matches 
on backward contour k is formally defined by 

Dk ■■= € Ck \ ^{i'j')€Ck ■■ (i' = iAf > j)V {i’ >iAf =j)}. 

The fundamental observation, which forms the basis of our approach, is that the 
value CS{i,j) may be calculated very easily from the unique forward contour 
and the unique backward contour a match (i, j) belongs to. 

Lemma 1. For each match (i,j) the following holds: 

{i,j)GCkn^>^cs{t,j) = k + k'-i. 

Proof. Since (i,j) G Ck there is a chain of length k ending at (i,j) and from 
{i,j) G Ck' we can conclude that there exists a chain of length k' starting at 
(*, j). Joining these chains at {i,j) gives a chain of length k + k' — 1. 

Assume there is a common subsequence of length > k + k' — 1 containing 
(i,j). Then there would be a chain of length > k ending at (i,j) or there would 
be a chain of length > k' starting at But this would contradict (i,j) G Ck 

or (i,j) G Ck', respectively. □ 

This fact immediately implies the following characterization of the nodes V = 
U 1 UV 2 U ■ ■ • UV), of our LCSs-graph G = {V, E). 
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Fig. 2. Backward contours (a) and essential non-dominant matches (b). 



Corollary 1. Let p be the length of an LCS of two input sequences A and B. 
Then 

(i,j) € Vk (i,j) G C/c n Ck', k' = p - k + 1. 

We call a forward contour and a backward contour complementary if their ranks 
add up to p + 1. The major drawback of Corollary ^is that the given charac- 
terization still relies on all matches on the various contours. With regard to the 
efficient LCS- algorithms it would be helpful to have a characterization solely in 
terms of dominant matches. On the other hand, as can be seen by the example 
given in Figure Eljb), there may be essential matches for the LCSs-graph which 
are neither dominant on a forward contour (solid lines) nor dominant on a back- 
ward contour (dashed lines). The LCS aad may not be generated without using 
the non-dominant match (4,4). 

Our initial observation is that two complementary contours may touch but 
never cross each other. Further, the forward contour is to the bottom/right of 
the backward contour and both contours have a least one match in common. 

Lemma 2. Let Ck and Ck> he two complementary contours, i.e. p = k + k' — 1 
where p is the length of a LCS. Then it holds that 

1- ^((b j) G Ck G Ck>) :i<i'Aj< f, 

2. 3{i,j) G CfcOCfc/. 

Proof. Assume there are matches {i,j) G Ck and {i',f) G Ck> such that i < i' 
and j < f . Then (*',/) G C;, I > k (and {i,j) G Cn, I' > k'). So we could form 
a chain of length I + k' — 1 > k + k' — 1 = p. A contradiction. If there would 
be no match (i,j) G Cfe fl Ck' then, by Corollary [Q 14 = 0 and thus no LCS of 
length p could exist. □ 
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forward contour 



(a) (b) (c) 

Fig. 3. The shape of common parts of complementary contours. 



We can now give the desired characterization. Informally, it states that common 
parts of complementary contours can be completely specified by means of pairs 
of dominant matches on both contours. 

Lemma 3. Let Ck be a forward eontour and let Ck', k' = p — k + 1, he a 
eomplementary baekward eontour. Then it holds: 

(ij) € Vk 4=^ ^(x,y) € Dk,3{x',y') € Dk< : 

{x = i = x' A y < j < y') V {x < i < x' A y = j = y') . 

Proof. First we prove that a match (i,j) according to the given characterization 
in fact belongs to Vk. We show that (i,j) G fl Ck'. Consider the case x < 
i<x'Ay = j = y' (the case x = i = x'Ay<j<y' is symmetric). We claim 
that these prerequisites imply {x' ,y') G Ck, and therefore {i,j) G Ck. Assume 
{x' , y') Ck. Since y = y' , contour Ck would then have to take a left turn before 
reaching row x' . But this means there has to be a dominant match {x" , y") G Dk 
such that x” < x' A y" < y' , contradicting Lemma El A similar argument shows 
that {x,y) G Ck' and hence {i,j) G Ck'. 

In order to prove the other direction we have to show that each match (i, j) G 
Vk can be characterized in the claimed way. To this end we will show that 
all common parts of two complementary contours, containing all matches in 
Ck n Ck', can be split into horizontal and vertical pieces such that each piece 
will be bordered by a dominant match (x, y) G Dk and by a dominant match 
(x',y') G Dk'. From Lemma 0 we know that complementary contours may only 
touch but never cross each other and that the backward contour is to the top/left 
of the forward contour. Now follow both complementary contours from top/right 
to bottom/left up to the first common point {i,j). 

The main observation is that (i,j) G Dk V (i,j) G Dk'. There is only one 
possibility how both contours may join, namely when a vertical piece of the 
backward contour meets a horizontal piece of the forward contour. Since the 
contours may not cross each other there must be a bend in this point. If the 
bend is on the backward contour {i,j) G Dk' (see Fig. 0 (a)) and if the bend 
is on the forward contour {i,j) G Dk (see Fig. 0 (b)). If there is a bend on 
both contours {i,j) G DkfiDk' and the contours only touch in this point. In the 
former cases both contours may have some path in common. If this path includes 
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several bends (see Fig.0 (c)) there must be a dominant match at each corner of 
these bends alternately from forward and backward contours. Finally, in some 
point the contours may spread apart again and there is a dominant match at 
this point too for reasons of symmetry. 

Thus, each common horizontal piece is bordered by dominant matches (x, y) G 
Df. and (x', y') G D^i such that x < x' Ay = y' and each common vertical piece 
is similarly bordered by dominant matches such that x = x' Ay < y' . □ 

Note that this characterization allows us to give a compact description of the 
possibly much greater LCSs-graph. For each pair of complementary contours we 
only have to record pairs of dominant matches which sandwich common parts 
of both contours. Le., it is sufficient to compute the sets Vl C Vk where 

:= {Dk n Cfc') U {D/^i n Ck), k = 1, . . . ,p and k' = p + 1 — k. 

Let V' = Such a compact description may be helpful, if 

— we need a space-efficient way to store all optimal solutions which allows the 
recovery of an explicit representation quickly if needed (see below); 

— further computations are to be done on all optimal solutions. It may be more 
efficient to do such computations on the compact representation 

We can also use the sets to specify two distinguished LCSs. Consider the 
match Xfc G which is to the top/right of any other match in and similarly 
let Sk G denote the match which is to the bottom/left of any other match 
in Vl- Then, figurally speaking, all LCSs will be located somewhere between 
the two outer LCSs R = riX 2 ■ ■ - Xp and S = S 1 S 2 . . . Sp. This kind of information 
might be useful in a learning algorithm for the LCS-problem presented lately |S| . 
It can also be combined with a recent proposal for linear-space implementations 
im to devise a space-saving variation of the following method to construct the 
LCSs-graph. 

5 An Efficient Constrnction 

Based on LemmaElwe can now develop a general method to determine the nodes 
of the LCSs-graph. In a first step, simply compute the forward contours as well 
as the backward contours by any method of your choice. Then, for each pair of 
complementary contours, find the matches belonging to both contours. In order 
to perform this second step efficiently we assume that contours are given by 
linked lists, one for each forward contour and one for each backward contour, 
containing the (dominant) matches on the respective contours in sorted order, 
i.e. from top/right to bottom/left. Such lists may easily be generated by any 
algorithm during the first step of the computation. 

Let L and L be two such lists corresponding to complementary contours. If 
these lists would contain all matches then our task would be to find identical 
matches occurring on both lists. In view of Lemma O this could be done by a 
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simple appropriate scan which takes time proportional to the length of the two 
lists. For reasons of efficiency, however, we may only assume that the given lists 
contain the dominant marches of the respective contours. Lemma 0 showed that 
knowing those dominant matches on complementary contours which belong to 
both contours is sufficient to specify all nodes of the LCSs-graph G = {V,E). 
Since a dominant match on a forward contour need not be a dominant match 
on the complementary backward contour, and vice versa, we can no longer use a 
simple scan looking for identical matches on two complementary lists. But from 
Lemma 0 we know that if a dominant match actually belongs to V then there 
must exist a dominant match on the complementary contour located in the same 
row or column. Using this fact our desired dominant matches can still be found 
in time proportional to the length of the two involved complementary lists by 
scanning them as shown by the procedure “LCS-Merge” given in Figure 0 We 
assume that the two input lists are sorted and that the end of the lists will be 
marked by a sentinel (oo,oo). In order to avoid duplicates in the output list we 
assume that a match is only appended to the list by the operator • if it is not 
identical to the current last element of the list. 



Procedure LCS-Merge 

Input: L, a sorted list of dominant matches on a forward contour. 

L, a sorted list of dominant matches on a complementary back- 
ward contour. 

Output: Lg, a sorted list of those dominant matches from the two input 
lists which may be part of an LCS. 

Method: 

1. (a:, t/) <— Z/.first 

2. (a:', j/') •«— L.first 

3. while (x,y) ^ (oo,oo) and {x\y') ^ (oo,oo) do 

4. case 

5. y' < y ■ {x, y) •«— L.next 

6. x' < X : (a;', j/') Z/.next 

7. x' = x/\y’>y-. La La ■ [x' ,y') ■ {x,y\, {x' ,y') ^ L.next 

8. x' > X Ay' = y ■. La La ■ [x,y) ■ (a;', {/'); fy, y) L.next 

9. end 



Fig. 4. Procedure identifying dominant matches on complementary contours. 



Theorem 1. A compact representation of all LCSs, i.e. the node set V C V, 
can be eomputed in time 0{T) and space 0{S) where T and S is the time and 
spaee, respectively, of any algorithm creating all dominant matches. 
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Proof. The chosen algorithm will be invoked two times to determine the dom- 
inant matches on forward contours and backward contours, respectively. Using 
the procedure “LCS-Merge” to identify common parts of complementary con- 
tours takes time proportional to the total number of dominant matches which is 
upper bounded by T. 

0 {d) space will be occupied by the lists of dominant matches where d is the 
total number of dominant matches on forward contours and backward contours. 
Depending on the chosen algorithm we may need some additional space to store 
information gained in a preprocessing stage solving the so-called string identifi- 
cation problem. □ 

If needed the LCSs-graph can be constructed explicitly from our lists of dominant 
matches Vf in time proportional to the size of the graph. In order to determine 
the non-dominant matches still missing we just have to consider two succeeding 
dominant matches on these sorted lists. If they have a common component they 
are the endpoints of an interval which may contain additional matches to be 
included. Using two standard lookup-tables which contain the next occurrence of 
a symbol a G S to the right of a given position in strings A and B, respectively, 
we can identify each such match in constant time. These tables, which take 
0(|U| • n) time and space to construct, are required anyway by many algorithms 
computing dominant matches. There is also an 0 {n) time and space variation of 
these tables which provides the desired information in 0(log |I7|) time per query 

P 

Edges can be determined according to the definition of the LCSs-graph by 
scanning sorted neighboring node lists 14 and I4+1 as follows. We consider each 
match on I4 in succession. Let (x,y) be the first match on I4. Scanning Vk+i, 
we find the first match {x',y') such that x < x' A y < y' . A pointer P to this 
position on I4+1 is saved and we insert the edge ([a:,?/], [x' ,y']). Proceeding on 
14+1, we continue to insert edges leaving {x, y) as long as x < x' Ay < y' holds. 
Then we consider the next match on 14, starting the scan on I4-|-i at the position 
indicated by P. Note that in general matches on I4+1 will be considered several 
times. We distinguish two cases depending on whether an edge is inserted when 
considering a match on I4+1 or not. In the former case we can assign the cost 
to the the edge inserted. There are two cases which do not lead to the insertion 
of an edge: 

1. searching for the first match on I4+1 which is the endpoint of an edge leaving 
the current match on 14: in this case the pointer P ensures that this occurs 
at most once for each match on I4+1. 

2. reaching the first match on I4+1 which no longer is an endpoint of an edge 
leaving the current match on 14: this happens at most once for each match 
on Vfc. 

Thus, we have shown the following theorem. 

Theorem 2. The LCSs-graph G = (V,E) can be constructed explicitly in time 
and space 0 {n\S\ \V\ |if|) from the compact representation V . 
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If one is not interested in different embeddings of the same character sequence 
of an LCS we can also construct a reduced graph G = {V, E) from G which only 
contains a single (canonical) embedding for each possible character sequence of 
an LCS. This is done in time 0(|G|) via an appropriate breadth first search on 
G which eliminates all edges and nodes not suitable for a canonical embedding. 

6 Conclusion 

Using a symmetry of the LCS problem we gave an exact characterization of 
all matches possibly occurring on an LCS. From this we developed a general 
method to compute a compact representation of all LCSs while maintaining the 
favorable time complexities of known efficient algorithms for determining the 
length of an LCS. A more suitable representation, the LCSs-graph G, can be 
constructed from the compact representation in time proportional to the size of 
G. 

It would be interesting to see whether a characterization similar to Lemma 
01 could be given for more than two input sequences. Another open question 
concerns the number d of dominant matches on forward contours and the number 
d of dominant matches on backward contours. Does the structure of the LCS 
problem allow to establish an upper bound on \d — d\ which shows that these 
two values may not differ very much? 
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Abstract. We propose a blocked version of Floyd’s all-pairs shortest- 
paths algorithm. The blocked algorithm makes better utilization of cache 
than does Floyd’s original algorithm. Experiments indicate that the 
blocked algorithm delivers a speedup (relative to the unblocked Floyd’s 
algorithm) between 1.6 and 1.9 on a Sun Ultra Enterprise 4000/5000 for 
graphs that have between 480 and 3200 vertices. The measured speedup 
on an SGI 02 for graphs with between 240 and 1200 vertices is between 
1.6 and 2. 

Keywords: All pairs shortest paths, blocking, cache, speedup. 



1 Introduction 

Traditionally, algorithms are developed, analyzed, and optimized for the RAM 
computer model in which a computer has a single uniformly accessible memory 
P. Contemporary computers, however, have multiple levels of memory and 
the memory access time varies significantly from one memory level to the next. 
For example, contemporary Sun and SGI workstations have an LI cache, an L2 
cache, and a main memory. The LI cache in a Sun Ultra Enterprise 4000/5000 
is 16 KB, the L2 cache is 4 MB, and main memory is in excess of 100 MB. 
Additionally, a contemporary computer has a limited number of registers — ten 
to twenty. Typically, it takes 1 cycle to access data from LI cache. When the 
desired data is not in LI cache, we experience an LI miss and the data is brought 
from L2 cache to LI cache using 6 to 10 cycles. If the desired data is not in L2 
cache either, then we experience an L2 miss and data is fetched from main 
memory into L2 cache at a cost of (say) 50 cycles, and from there to LI cache. 
We can reduce run time by organizing our computations so as to minimize the 
number of LI and L2 cache misses. 

Although several theoretical models for computers with multiple-level mem- 
ories have been proposed j,3f2|7] . these models have not found wide application, 
and most of the work in the area of performance enhancement via cache opti- 
mization has been experimentally oriented. Trace driven simulators have been 
used to study the cache performance of a specific program running on a specific 
computer, determine the portions of the code or the data structures that result 
in a large fraction of the cache misses, and then optimize these code segments 
and/or data structures. Trace driven simulations have also been used to develop 
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, for example, for some 



analytical models of cache behavior. See 
ways in which trace driven simulators have been used in cache performance en- 
hancement studies. 



La Marca and Ladner uni develop a model for a single-level direct-mapped 
cache. They use this model to analyze the performance of binary heaps and 
cache-aligned d-heaps. LaMarca and Ladner M optimize the cache performance 
of several sorting methods. Their cache optimized heapsort and mergesort codes 
achieve a speedup of 1.85 and 1.38, respectively, when sorting 1,000,000 uni- 
formly distributed integers on a Sprac 10 processor. Lam, Rothberg, and Wolf 
ra have considered the cache performance of a blocked matrix multiply code 
relative to a traditional matrix multiply code. They report a speedup of 4.3 for 
their blocked matrix multiply code for a matrix of size 300. Sulatycke and Ghose 
m and Stewart |2D| have also studied the cache performance of various ma- 
trix multilication algorithms. Stewart m reports that the best way to muliply 
the matrices A and B is to first transpose B and then use the classical three 
loop algorithm on A and He further reports that by simply reordering the 
loops from the traditional ijk order to an ikj order (i.e., interchange the second 
and third for loops in the traditional code) the code performance is about the 
same as when square blocks (as used in m are used); row blocks yield superior 
speedup than column blocks and ikj ordering. Note that the transpose method, 
ikj ordering, square blocking, and row blocking deliver speedup relative to the 
traditional ijk code by reducing cache misses. Stewart m reports a speedup of 
2.7 for the transpose method relative to the ijk code; both codes were written 
in C and compiled using maximum compiler optimization; the matrix size was 
1200, and the code was run on a SUN Ultra Enterprise 4000/5000 computer. 

Al-Furaih and Ranka have studied cache optimization methods for sort- 
ing and unstructured iterative computations. 

In this paper we propose a blocked formulation of Floyd’s dynamic pro- 
gramming algorithm to find the lengths of the shortest paths between all pairs 
of vertices in a graph HH. Blocked (or tiled) computation methods have been 
used before (for example, |1 till 0p25f 1 2ft)f I j l. Our blocked algorithm provides a 
speedup (relative to the unblocked algorithm) between 1.6 and 1.9 on a Sun 
Ultra Enterprise 4000/5000 for graphs that have between 480 and 3200 vertices. 
The measured speedup on an SGI 02 for graphs with between 240 and 1200 
vertices is between 1.6 and 2. These speedups are comparable to the speedups 
cited above for cache-optimized sorting and matrix multiplication codes on Sun 
platforms. 



In Section Elwe give Floyd’s all-pairs shortest-paths algorithm. Section^ ana- 
lyzes the potential speedup benefits from reorganizing Floyd’s algorithm to make 
better use of cache. This analysis uses data gathered using the cache simulation 
tool Shade m- Our blocked version of Floyd’s algorithm and a correctness proof 
are given in Section 0 Section |3 gives measured speedup results for our blocked 
algorithm. 
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2 Floyd’s All-Pairs Shortest-Paths Algorithm 

Let G = (F, i?) be a directed graph with n vertices. Let cost be the cost adjacency 
matrix for G. So cost{i,i) = 0, 1 < i < n; cost{i,j) is the length (or cost) of 
edge (j,j) if (i, j) G E{G) and cost{i,j) = oo if i ^ j and (i,j) ^ E{G). 

In the all-pairs shortest-paths problem we are to determine a matrix A such 
that A{i,j) is the length of a shortest path from i to j. When G has no cycle 
whose length (cost) is less than 0, the matrix A may be computed using dynamic 
programming El. Let A^{i, j) be the length of a shortest path from i to j under 
the constraint that the path contain no intermediate vertex whose index is more 
than k. It is easy to see that A(i,j) = A^(i,j). When G has no cycle with 
negative length, the following dynamic programming recurrence is valid: 

= cost{i,j) (1) 

A’^{i,j) = mm{A^-^{i,j),A^-^{i,k) + A’^~^{k,j)},k > 1 (2) 

Equations E and |21 lead to the algorithm of Figure Q to compute A. This 
algorithm is known as Floyd’s algorithm. It may be shown El that AllPairs 
computes A^{i,j) = A[i][j] in iteration k of the outermost for loop. 



function AllPairs(int A, int n) 

{// A[i][j] = cost(i,j) initially 
// A[i] [j] equals length of shortest 
// i to j path on termination 
for (k = 1; k <= n; k++) 
for (i = 1; i <= n; i++) 
for (j = 1; j <= n; j++) 

A[i] [j] = min(A[i] [j] , 

A[i] [k] + A[k] [j]); 



} 



Fig. 1. Floyd’s shortest-paths algorithm 



3 Upper Bound on Attainable Speedup 

We compute an upper bound on the maximum speedup attainable by rearranging 
the computation of Figure 0 so as to optimize cache useage. In computing this 
bound we assume that any rearrangement of the computation will not decrease 
the number of accesses made to the elements of the array A. 

We first obtain an equation to estimate the execution/run time of Floyd’s 
algorithm of Figure [D The execution time of a program is given by the following 
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equation [T^ : 

execution time = {CPU clock cycles + 
memory stall cycles) 

X clock cycle time (3) 

where memory stall cycles is the number of cycles the CPU spends waiting for 
a memory reference to complete. The following equations are also from m- 

CPU clock cycles = CPI x IC (4) 

memory stall cycles = number of LI misses x 

LI miss penalty (5) 

number of LI misses = IC * LI misses per instruction (6) 

LI misses per instruction = memory references 

per instruction 

xLl miss rate (7) 

where IC is the instruction count, CPI is the clock cycles per instruction, 
LI miss penalty is the number of cycles the CPU waits when there is an LI 
cache miss, and LI miss rate is the number of LI misses per memory reference. 
From these equations we obtain: 

execution time = {CPI x IC + 

IC X LI misses per instruction 
xLl miss penalty) 

X clock cycle time (8) 



We also see that 

LI miss penalty = L2 hit time + L2 miss rate x 

L2 miss penalty (9) 

where L2 hit time is the number of cycles to load an LI cache line from L2 
cache and L2 miss penalty = memory hit time is the number of cycles needed 
to load an L2 cache line from main memory. 

We use Equations |S1 and O to estimate the run time of Floyd’s algorithm. 
Since the L2 hit time and L2 miss penalty are architecture dependent and not 
available to us, we use typical numbers for these — the L2 hit time is assumed to 
be between 6 and 10 cycles and the L2 miss penalty is assumed to be 50 cycles. 
For the LI misses per instruction and the L2 miss rate we use data obtained by 
using the cache simulator Shade on Floyd’s algorithm. Tabled gives this data. 
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Table 1. Cache simulator data for algorithm of Figure 0 



Matrix size 


LI misses 

per instruction (%) 


L2 miss 
rate (%) 


480 


3.950 


18.42 


800 


4.106 


19.17 


1600 


4.133 


19.42 


2400 


4.826 


19.64 


3200 


5.553 


20.07 



Now we obtain a lower bound on the run time of a cache optimized version 
of Floyd’s algorithm. Substituting Equation 0 into Equation 0 and making the 
reasonable assumption that cache optimization will not decrease the total num- 
ber of memory references (i.e., the number of memory references for the cache 
optimized code is at least IC * memory references per instruction where IC 
and memory references per instruction are for AllPairs) yields 



execution time > {CPI x IC + 

IC X memory references per 
instruction 
xLl miss rate x 
LI miss penalty) 

xclock cycle time (10) 

The cache simulator gives 0.35 as the memory references per instruction for 
AllPairs. Substituting 0.35 for the number of memory references per instruction 
and the right side of Equation 0for the LI miss penalty into Equation ED we 
get 



execution time > {CPI x IC + 0.35 x IC x 

LI miss rate x {L2 hit time + 

L2 miss rate x L2 miss penalty)) 

xclock cycle time (11) 

We may obtain a lower bound for the LI and L2 miss rate by determining the 
minimum number of LI and L2 misses that every reorganized version of Figure 0 
must make. Since we intend to declare i, j, k, and n as register variables |E|, 
references to these variables do not access cache and so do not cause any cache 
misses. Therefore, we focus on cache misses attributable to the array A. For our 
analysis we use the cache characteristics of the Sun Enterprise 4000/5000 that 
are shown in Table □ By direct mapped we mean that each byte of main memory 
has exactly one byte of cache to which it may be mapped. The line size of a cache 
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Table 2. Cache characteristics of the Sun Enterprise 4000/5000 



Cache 


Associativity 


Cache size 


Line size 


LI 

L2 


Direct mapped 
Direct mapped 


16KB 

4MB 


32 bytes 
64 bytes 



gives the unit of memory transfer. So in the Sun Enterprise 4000/5000 an LI 
cache miss results in a 32-byte block of data being transferred from L2 cache 
into LI cache. The transferred block is one-half of an L2 line. 

For the analysis we assume that A is an integer array and that each integer 
is 4 bytes. Since Floyd’s algorithm accesses each of the elements of A, all 
elements of A must get to LI cache at some time. Each LI cache miss brings 
in exactly 32 bytes of data (i.e., 8 elements of A). Therefore, the number of LI 
cache misses is at least n^/8. By a similar reasoning, the number of L2 cache 
misses is at least n^/16. Further, Floyd’s algorithm makes 3n^ read accesses to 
A (i.e., in the right side of the min statement of Figure 0 and write accesses 
(the left side of the min statement). We note that when the min statement of 
Figure Elis coded as an if statement, write accesses are made only when the 
new a[i] [j] value is smaller than the old one. In this case the number of write 
accesses ranges from 0 to n^. To keep the analysis simple, we use as the 
write access count. The total number of accesses to A (read and write) is 4n^. 
Therefore, 



LI miss rate = LI misses per A reference 

> nV8/(4n^) = l/(32n) (12) 

L2 miss rate = L2 misses per A reference 

> nVl6/(4n^) = l/(64n) (13) 

The equality between the miss rate and the misses per A reference follows 
from our assumption that variables other than A will be register variables and 
so all memory references are to elements of A. Since we assume that cache 
optimization does not reduce the number of A references, these bounds apply to 
all cache optimized versions of AllPairs. 

Substituting the bounds of Equations IT^ and [T^ into Equation im we get the 
following lower bound on the run time of a cache optimized version of Floyd’s 
algorithm. 



execution time > (CPI x IC + 

0.35 X IC X l/(32n) x 
(L2 hit time + 
l/(64n) X L2 miss penalty) 
X clock cycle time 



(14) 
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Dividing Equation 0 by Equation m yields an upper bound on the speedup 
obtainable by optimizing cache utilization. FigureQ plots this upper bound when 
CPI ranges between 1 and 2, L2 hit time ranges from 6 to 10 cycles, and L2 
miss penalty is 50 cycles. The LI misses per instruction and the L2 miss rate 
are taken from Table ^ Figure E] gives the maximum speedup we can get by 
optimizing the cache usage of Floyd’s algorithm on typical computers that have 
a two-level cache. 




Fig. 2. Maximum achievable speedup for different matrix sizes 



4 Blocked Version of Floyd’s Algorithm 

4.1 The Algorithm 

We partition the cost adjacency matrix into submatrices of size B x B. B is called 
the blocking factor. Although this is not necessary, we assume, for simplicity, that 
B divides n. Our blocked version of Floyd’s algorithm (Figure CJ will perform 
B iterations of the outermost loop of Figure don each B x B block of A before 
advancing to the next B iterations. It is convenient to think of each set of B 
iterations as divided into three phases. (Note that our implementation does not 
actually preform the computation in the three phase order described below.) 
For example, in phase 1 of the first set of B iterations. Equation Elis used to 
compute = A^, 1 < A: < i? for the elements in the top left block, block (1,1). 
Since these B iterations access only the A elements within block (1,1), we say 
that block (1,1) is a self-dependent block in the first B iterations. 

In phase 2 of the first B iterations a modified Equation Elis used to compute 
, 1 < fc < S for the remaining blocks (1, *) and (*, 1) that are on the same 
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Currently computing 



B Computation over 
I I Computations to be done 

(a) Phases when (1,1) is the self-dependent 
block 




H Computation over 
I I Computations to be done 



(b) Phases when block {t, t) is the self- 
dependent block 



Fig. 3. Blocks computed in each phase 



row or column as the self-dependent block. For the remaining (1,*) blocks the 
modified Equation 0 is 

where For the remaining (*,1) blocks the modified Equation^ 

is 

+ D®(fc,j)},fc > 1 (16) 

In phase 3 , 1 < A: < B is computed for the remaining blocks (i.e., for 

blocks that are not on the same row or column as the self-dependent block). 
This computation is done using Equation II 71 

D\i,j) = min{D'^-\i,j),D^{i,k) + D^{k,j)},k>l (17) 
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Phase 3 is followed by the next round of B iterations. These are also done in 
three phases. This time block (2,2) is the self-dependent block. D^, B < k < 2B 
are computed for the self-dependent block in phase 1 using the equation 

(18) 

In phase 2 D^, B < k < 2B are computed for the remaining blocks that 
are on the same row or column as the self-dependent block and in phase 3 D^, 
B < k < 2B is computed for the blocks that are not on the same row or column 
as the self-dependent block. The phase 2 computation uses the following equation 
for the (2,*) blocks 

D'^{i,j)=mm{D'^-\t,j),D^^{i,k)+D'^-\k,j)} (19) 

The (*,2) blocks use the following equation 

+ D^^{k,j)} (20) 

and the phase 3 blocks use the equation 

= mm{D'^-^{i,j),D'^^{i,k)+D‘^^{k,j)} (21) 

The following equations are used to compute the (t, *), and phase 3 

blocks, respectively. 

(22) 

D’^{i,j)=min{D'^-\t,j),D'^-\t,k)+D*^{k,j)} (23) 

D>^{i,j)=mm{D>^-\z,j),D*^{i,k)+D*^{k,j)} (24) 



4.2 Correctness of Blocked Algorithm 

The D^{i,j) values computed by the blocked algorithm are not necessarily the 
same as the A^{i,j) values computed by the unblocked algorithm. For example, 
when B = 4, the unblocked algorithm computes A^(4, 7) = min{A°(4, 7), A°(4, 1) 
-|-A°(1,7)}, whereas the blocked algorithm computes Zl^(4, 7) = min{Zl°(4, 7), 
£>4(4, l)-k£>°(l, 7)} =min{A°(4, 7), £‘^(4, l)-fA°(l, 7)}. Since £‘‘(4, 1) = A^(4, 1) 
is < A°(4, 1), £i(4, 7) < A\4, 7). 

To establish the correctness of the blocked algorithm we must show that 
for all i and j. That is, even though D^{i,j) and A^{i,j) 
may not be equal for k < n, the values agree in the end when k = n. Actually 
we will show that A and £ agree at the end of each set of B iterations That 
is, = A’^{i,j) for all i and j whenever fc is a multiple of B. Hence 

D"'{i,j) = A^{i,j) for all i and j. 

Let k = qB. The proof is by induction on q. We may show that = 

A^{i,j) for all i and j for 0 < q < n/ B. The proof is omitted from this version 
of the paper. 
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4.3 Optimal Blocking Factor 

When computing the D values in a block during any round (i.e., an iteration 
of the outermost loop) of function BoundedAllPairs, at most three blocks are 
active. The computation for the self-dependent block accesses elements only in 
the self-dependent block. So during the self-dependent block computation only 
1 block is active. The computation for a block R that is on the same row or 
column as the self-dependent block acceses elements in R as well as elements in 
the self-dependent block. Therefore, 2 blocks are active during the computation 
for R. For a block R that is not on the same row or column as the self dependent 
block, BlockedAllPairs accesses elements from 3 blocks — block R, the block 
that is in the same row as the self-dependent block and the same column as R, 
and the block that is in the same column as the self-dependent block and in the 
same row as R. Therefore, LI cache misses are minimized by choosing the largest 
block size B such that 3 block loads of the array D fit into LI cache. Suppose 
that the elements of D are 4-byte integers and that our LI cache capacity is C 
bytes and that each LI cache line is S bytes. We must choose B to be the largest 
integer such that 35^ *4 < C (equivalently, B < IY2) and B is a, multiple of 
S/A. The second requirement is necessary as the smallest unit of data brought 
into LI cache is S bytes and these S bytes are contiguous bytes of memory. 

For the Sun Ultra Enterprise 4000/5000 C = 16iF and S = 32. Therefore, 
the blocking factor should be the largest integer that is < C/12 = 37 and is a 

multiple of 32/4 = 8. That is, we should use B = 32 as the blocking factor. For 
the SGI 02 C = 327F and S' = 32. The optimal blocking factor for the SGI 02 
is the largest integer that is < \/C jYl = 52 and is a multiple of 32/4 = 8. This 
optimal blocking factor is 48. 

5 Experimental Results 

The speedup of our blocked shortest paths algorithm relative to the standard 
unblocked algorithm was measured by programming the two algorithms in G-l — h 
(the g-l — h compiler with optimization option o5 was used) and running the two 
programs on on a Sun Ultra Enterprise 4000/5000 and an SGI 02. Both programs 
were compiled using the highest-level of compiler optimization possible. 

We first present the results for the SUN Ultra Enterprise. Figure Ogives the 
measured speedups for different blocking factors and different n. As predicted 
by our analysis, the otimal blocking factor is 32 for all n. 

Figure El compares the speedup obtained by BlockedAllPairs and the max- 
imum speedup possible by optimizing cache utilization. The curve for maximum 
possible speedup is that of Figure El 

The speedup obtained by BlockedAllPairs is fairly close to the maximum 
possible. One reason we do not achieve the predicted maximum speedup is that 
the total instruction count for BlockedAllPairs is more than that for AllPairs. 
Recall that in determining the maximum speedup curve of Figure 0we assumed 
that the instruction count for the cache optimized algorithm is the same as that 
of AllPairs. 
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Fig. 4. Speedup of BlockedAllPairs on a Sun Ultra Enterprise 




Fig. 5. Measured and maximum possible speedup 



Figure El gives the LI misses per instruction for the unblocked and blocked 
versions of Floyd’s algorithm. The data for this figure were obtained using the 
cache simulator Shade. As expected the blocked code shows better cache utiliza- 
tion. 

Table Elshows the cache details for the SGI 02 computer and FigureQ shows 
the speedup obtained by the blocked algorithm on an SGI 02. Except for one 
anomaly, maximum speedup is obtained when the blocking factor is the predicted 
optimal factor of 48. 
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Fig. 6. Misses per instruction for unblocked and blocked algorithms 



Table 3. Cache configuration of SGI 



Cache type 


Cache size 


LI 

L2 


32KB 

1MB 




Blocking factor 



Fig. 7 . Speedup obtained by BlockedAllPairs on an SGI 02 
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6 Conclusion 

We have developed a blocked version of Floyd’s all-pairs shortest-paths algo- 
rithm. Experimental results show that the blocked version obtains speedups close 

to the maximum possible for a cache optimized version of Floyd’s algorithm. 
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Abstract. Recently external memory graph algorithms have received 
considerable attention because massive graphs arise naturally in many 
applications involving massive data sets. Even though a large number of 
I/O-efficient graph algorithms have been developed, a number of funda- 
mental problems still remain open. In this paper we develop an improved 
algorithm for the problem of computing a minimum spanning tree of a 
general graph, as well as new algorithms for the single source shortest 
paths and the multi-way graph separation problems on planar graphs. 



1 Introduction 

Recently external memory graph algorithms have received considerable attention 
because massive graphs arise naturally in many applications involving massive 
data sets. One example of a massive graph is AT&T’s 20TB phone-call data 
graph [HI- Other examples of massive graphs arise in Geographic Information 
Systems (GIS). For instance, GIS terrains are often represented using planar 
graphs and many common GIS problems can be formulated as standard graph 
problems (Arc/Info the most commonly used GIS package, contains functions 
that correspond to computing depth-first, breadth- first, and minimum spanning 
trees, as well as shortest paths and connected components). When working with 
such massive graphs the 1/ 0-communication, and not the internal memory com- 
putation time, is often the bottleneck. Designing efficient external memory algo- 
rithms for such problems can thus lead to considerable runtime improvements, 
as for example illustrated in our previous work |7| . 

Even though a large number of I/O-efficient graph algorithms have been 
developed in recent years, a number of important problems still remain open. For 
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example, developing efficient algorithms for basic problems such as breadth-first 
search and depth-first search remain open. In this paper we develop I/O-efficient 
algorithms for the minimum spanning tree (MST) and single source shortest 
paths (SSSP) problems, as well as for multi-way planar graph separation. 

1.1 Problem Statement 

MST and SSSP are well-known problems on a weighted graph G = (V, E)\ MST 
is the problem of finding a spanning tree for G of minimum weight and SSSP is 
the problem of finding the shortest paths from a given source vertex in G to all 
other vertices in G (the length of a path is the sum of the weights of the edges 
on the path). 

Consider an undirected graph G = (P, if)0An f{V)~ separator of G is a 
subset S of the vertices of G of size f{V) such that the removal of S disconnects G 
into two subgraphs Gi and G 2 , each of size at most Lipton and Tarjan 123! 
proved that any planar graph has an 0(\/P)-separator and gave a linear time 
algorithm for finding such a separator. Using this result recursively, a planar 
graph can be decomposed into 6>(^) subgraphs Gi with 0{R) vertices each and 
O(^) separator vertices, such that there is no edge between a vertex in Gi 
and a vertex in Gj for i ^ j. We call such a decomposition a multi-way planar 
graph separation of G. Graph separation is often used in the design of divide- 
and-conquer algorithms. 

Throughout this paper we assume that the input graph G is given in edge-list 
representation. If G is planar we assume it is embedded in the plane. We also 
assume without loss of generality that G is connected and that no two edges 
have the same weight. In some of our algorithms we will assume that a breadth- 
first-search tree T of G is given. In such cases we assume that T is represented 
implicitly by storing with each vertex m in G its parent in T and marking every 
edge of G as either a tree or a non-tree edge. 

1.2 Previous Results on I/O-EfRcient Graph Algorithms 

We work in the standard two-level I/O model with one (logical) disk [3120] . The 
model defines the following parameters: 

N = V + E, 

M = number of vertices/edges that can fit into internal memory, 

B = number of vertices/edges per disk block, 

where M < N and 1 < R < for some e > 00 An Input/Output (or 

simply I/O) involves reading (or writing) a block from disk into (from) internal 

^ For convenience we will use the name of a set to denote both the actual set and its 
cardinality. 

^ Often it is only assumed that B < M/2 but sometimes, as in this paper, the very 
realistic assumption that the main memory is capable of holding elements is 
made (or as here, for some £ > 0). 
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memory. Our measure of performance of an algorithm is the number of I/Os 
it performs. The number of I/Os needed to read N contiguous items from disk 
is scan(iV) = 0{^) (the scanning bound), and the number of I/Os required to 
sort N items is sort(fV) = |3| (the sorting bound). In practice 

the difference between an algorithm doing N I/Os and one doing scan(fV) or 
sort(iV) I/Os can be significant j2|. 



Table 1. Best known upper bounds for basic graph theoretic problems. 



Problem 


General undirected graphs 


DFS 


o{^% + v) u 

O {{V + scan(F)) ■ log ^ -I- sort(£’)) 


BFS 


0{V + f • sort(F)) PS| 


CC 


O (sort(F) • loglog I/P) 123 


MST 


O (sort(F) • log ;^) H2| 

O (sort(F) ■ log B -1- scan(B) ■ log V) |22| 


SSSP 


0(F+|-logF) 1^ 



I/O-efficient graph algorithms have been considered by a number of au- 
thors 11 1'iKIbH 011 'iH (111 Table ^ reviews the best known algo- 

rithms for basic graph theoretic problems on general undirected graphs. For 
directed graphs the best known algorithm for breadth-first search (BFS) and 
depth-first search (DFS) use O ((y -I- scan(if)) • log ^ -I- sort(if)) I/Os [Tn| . 
Lower bound results were proved in ftill2l25| . Note that no 0(sort(if)) (de- 
terministic) algorithm is known for any of the problems, and that the best 
known algorithms for DFS, BFS and SSSP require ^2(V) I/Os. MST and con- 
nected components (CC) can be solved in 0(sort(if)) I/Os with randomized 
algorithms M- 

Improved algorithms have been developed for several special classes of graphs. 
For trees, 0(sort(iV)) algorithms are known for BFS and DFS numbering, Eu- 
ler tour computation, expression tree evaluation, topological sorting, as well as 
several other problems unna. For planar graphs, 0(sort(fV)) algorithms are 
known for CC and MST ^21- For grid graphs 0(sort(iV)) algorithms are known 
for BFS and SSSP, and an 0(scan(iV)) algorithm for CC [7|. See [3S1 for a 
complete reference. 

Given that even very basic graph problems seem hard to externalize, it is 
natural to try to reduce the problems to one another. A first step in this direction 
was taken by Hutchinson et al. HH] who considered the problem of computing 
an 0(\/]V)-separator of a planar graph I/O-efhciently. Given a BFS tree they 
showed how to compute a separator in 0(sort(7V)) I/Os. Given this algorithm, 
it is straightforward to solve the multi-way planar graph separation problem in 
0(log ^ ■ sort (IV))) I/Os, simply by applying the algorithm recursively. 
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1.3 Our Results 



In Section|2l we give an 0(sort(£’)-loglog = 0(sort(i?) -log log R) algorithm 
for the MST problem on general undirected weighted graphs, improving the 
previous bound of O (sort(if) • log B + scan(£’) • log V) |22|. The algorithm uses 
the same general idea as the CC algorithm by Munagala and Ranade m and 
consists of two phases: first a vertex contraction algorithm is used to reduce the 
number of vertices to O(^), and then an 0{V + sort(if)) MST algorithm is used 
on the reduced graph. The new contraction algorithm uses ideas similar to the 
ones used in |8I14I25| . as well as a simplified version of the basic contraction 
step used in previous MST algorithms |8ll2llifll4l22l25l28| . The new 0{V + 
sort(if)) MST algorithm is a modified version of Prim’s algorithm. It remains a 
challenging open problem to develop an 0(sort(if)) MST algorithm. 

In Section^ and^ we show that the multi-way planar graph separation prob- 
lem and the SSSP problem can be reduced to the BPS problem in 0(sort(A^)) 
I/Os: In Section El we give an 0(sort(iV)) algorithm for the multi-way pla- 
nar graph separation problem given a BPS tree. The algorithm improves the 
straightforward bound of 0(log ^ •sort(fV)) I/Os and uses a divide-and-conquer 
algorithm based on ideas from m- In Section 0 we show how to use this result 
to solve the SSSP problem in 0{soit{N)) I/Os. The algorithm is a generaliza- 
tion of our SSSP algorithm on grid graphs 0 and uses ideas similar to the ones 
utilized by Prederickson H3- We believe that our 0(sort(fV)) graph separation 
algorithm might prove helpful in reducing other problems on planar graphs to 
the BPS problem. It remains a challenging problem to develop an 0{soit{E)) 
BPS algorithm. Another interesting open problem is if it is possible to develop 
an 0(sort(if)) BPS algorithm for a planar graph given a multi-way separation 
of the graph. 



2 Minimum Spanning Tree on General Graphs 

In this section we describe our MST algorithm on general undirected weighted 
graphs. The basic idea is to reduce the number of vertices to ^ using an 
0(sort(i?)) vertex reduction algorithm O(loglog^) times, and then use an 
0{V + sort(if)) MST algorithm on the resulting graph. The overall I/O com- 
plexity will thus be 0(sort(£’)-loglog ^ -I- g -I- sort (if)) = 0(sort(if)-loglog ^) 
I/Os. In Section ITTI we first describe the 0{V + sort(if)) MST algorithm, and 
in Section 12.21 we then describe the reduction algorithm. The MST result is 
summarized in the following theorem. 

Theorem 1. The MST of an undirected weighted graph can be found in 
0(sort(if) • log log ^) I/Os. 

2.1 An 0(V + sort(£J)) MST Algorithm 

Our algorithm is a modified version of Prim’s internal memory algorithm HS|. 
The idea of Prim’s algorithm is to grow the MST iteratively from a source node 
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while maintaining a priority queue on the vertices not included in the MST so 
far; the priority of a vertex is the weight of the minimum edge connecting it to the 
current MST. The algorithm repeatedly extracts the minimum priority vertex 
V, adds it to the MST, and updates the priority of the vertices u adjacent to v. 
Specifically, the weight w of edge (v, u) is compared with the priority of vertex u 
in the priority queue, and an update is performed if w is smaller than the current 
priority. Prim’s algorithm cannot be implemented efficiently in external memory, 
the main reason being that the current priority of a given vertex cannot in general 
be obtained without doing one I/O. A direct implementation would thus lead to 
an 0{E) I/O bound. Previously known algorithms I12I22I rely instead on vertex 
contraction methods 

Our modification of Prim’s algorithm consists of storing edges in the priority 
queue instead of vertices. During the algorithm the priority queue contains (at 
least) all edges connecting vertices in the current MST with vertices not in the 
tree. The queue can also contain edges between two vertices in the MST. The 
algorithm works as follows: Repeatedly perform extract-min to extract the min- 
imum weight edge ( m , v) from the priority queue. If v is already in the MST the 
edge is discarded. Otherwise v is included in the MST and all edges incident to 
V, except (u,u), are inserted in the priority queue. The key to the 1/ 0-efficiency 
of the algorithm is that because we store edges in the priority queue we have a 
simple way of checking whether a vertex is already included in MST — as all 
edges incident to v are inserted in the priority queue when v is included in the 
MST, it follows that if both u and v are in the MST when processing an edge 
e = (m, v), the edge e must appear in the priority queue twice. Thus we can check 
if V is already included in the MST simply by performing one more extract-min 
and checking if it returns the same edge e (recall that we assume that no two 
edges have the same weight). 

The algorithm performs at least one I/O for each vertex which is included in 
the MST in order to read its adjacent vertices (traverse its adjacency lists). Thus 
processing all vertices and edges takes V+^ I/Os. It also performs 0{E) insert’s 
and extract jmin’s on the priority queue. Using an external priority queue m 
supporting these operations in 0(;glogjvf/s %) I/Os amortized we obtain: 

Lemma 1. The MST of an undirected weiqhted qraph can be computed in 0(V+ 
sort(A)) I/Os. 

2.2 MST Vertex-Reduction Algorithm 

Our MST vertex reduction algorithm is obtained using ideas from the connected- 
component algorithm of Munagala and Ranade and the notion of “blocking 
values” . The standard MST algorithm based on vertex contraction proceeds in 
[logU] phases [ 1 2I22| . In each phase the minimum cost edge adjacent to every 
vertex v is selected and output as part of the MST and the vertices connected by 
the selected edges are contracted to supervertices. Let the size of a supervertex 
be the number of vertices it contains from the original graph. After the Ah phase 
the size of every supervertex is at least 2*. Since one contraction phase can be 
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performed in 0(sort(if)) I/Os m this results in an 0(sort(i?) - log V) algorithm. 
The algorithm in m utilizes that a contraction step can be performed more 
efficiently after 0(log B) phases and obtains an 0(sort(i?) -log i?+scan(if) dog V) 
algorithm. 

Our algorithm runs for [log phases after which the number of super- 
vertices is at most g. Furthermore we reduce the number of I/Os used in the 
process by dividing the [log phases into superphases requiring 0{sort{E)) 
I/Os each: Let Ni = i.e. = Niy/Wi. Superphase i, for * > 0, consists 

of [log phases. In a preprocessing step we run the basic vertex contrac- 

tion algorithm once to insure that the number of vertices before superphase 0 
is Vo < ^ We will maintain the invariant that before superphase i the 

number of supervertices is at most To reduce the number of vertices to at 
most 5 it is therefore sufficient to perform 3-1- [logg /2 [log ^11 superphases and 
we obtain the 0(sort(if) • log log algorithm. 

The phases in each superphase only work on a subset of the (remaining) 
edges. The edge subsets are chosen in order to allow each supervertex to grow 
by a factor of '/Ni in superphase i. Let Gi = (V/, Ei) be the graph just prior to 
superphase i. We construct a graph G' = [Vi^E[), where E[ is a subset of Ei. 
For each vertex v, E[ contains the [-v/W] lightest edges adjacent to v. Heavier 
edges e = (u,u) adjacent to v are only included in E[ if e is among the [\/iV/] 
lightest edges adjacent to u. We define the blocking value of v to be the weight 
of the ([-v/iV/1 + l)-th lightest edge adjacent to v. The set E^ and blocking 
values can be computed using 0(sort(£'i)) I/Os. If we guarantee that Vi < ^ 
as stated above, it follows that E[ < 2Vi\\/Ni\ < As each contraction 

phase in superphase i can be performed in 0(sort(if')) I/Os, it follows that 
superphase i requires 0(sort(ifi)-|-sort(£’')dog(\/]V/)) = O (sort ( if )-|- sort 
log(i/7V/)) = 0(sort(if)) I/Os. After performing all the phases of superphase i 
the edges Ei — E[, i.e. the heavy edges which were not included in the sample, 
need to be re-incorporated in if^+i. This can be easily be done as in using 
0(sort(if)) I/Os in total. Details will appear in the full paper. 

The only thing that remains to be described is how the individual phases in 
superphase i are performed such that after superphase i the number of superver- 
and such that only edges that actually belong to the MST 



tices is at most 



Ni. 



are included. A phase is performed as in the basic vertex reduction algorithm: 
For each vertex v consider the adjacent edge e with minimum weight in if'. If 
the weight of e is smaller than the blocking value of v, then we select e for con- 
traction. If the weight of e is larger than the blocking value, no edges is selected 
for V, since there might be a lighter edge adjacent to u in if^ — if'. The selected 
edges are contracted in 0(sort(if')) I/Os (using the algorithm in jl 2l22l25j or 
a simpler algorithm which we will include in the full version). After the con- 
traction, the blocking value of a supervertex is set to be the minimum of the 
blocking values of the contracted vertices. The algorithm is correct as a simple 
induction argument can be be used to show that for every supervertex v the 
(contracted) edge sample contains all edges adjacent to v with weight smaller 
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than the blocking value of v (i.e. the edges selected in the next phase belong 
to the MST). If in superphase i the blocking value of a supervertex v prevents 
us from selecting an edge for v to be included in the MST, then v must be the 
contraction of at least \/Wi vertices from Vi. This follows from the fact that the 
blocking value of v corresponds to the blocking value of some vertex u in Vi and 
V must span the vertices adjacent to m in if'. If no blocking value prevents 

us from selecting an edges for v, then after [log phases v must have size at 

least 2 *°s %/M = ^/Nl. It follows that superphase i reduces the number of vertices 
by a factor of at least i.e. the number of vertices after superphase i is at 

most < ■- — Tv^ as claimed by the invariant. 

Lemma 2. Let G = {V,E) be an undireeted weighted graph. The MST problem 
on G ean be redueed to the MST problem on a graph with at most ^ vertices in 
0(sort(if) • log log I/Os. 

3 Multi-way Planar Graph Separation 

In this section, we show how to separate a planar graph G into 0{^) sub- 
graphs with 0{R) vertices each and a set of 0(sort(A^)) separator vertices using 
0(sort(iV)) I/Os. 

Given a BPS tree T of G, Hutchinson et al. uni showed how to compute a 
0(\//V)-separator for G in 0(sort(iV)) I/Os. Their algorithm closely follows the 
algorithm by Lipton and Tarjan The BPS tree T has the property that no 
edge crosses two or more levels, and hence every level in T is a separator in G. 
The basic idea is to use the “middle” level £i in T (the level containing the 
vertex with number N /2 in the BPS numbering) as the separator. Level £\ has 
the property that the total number of vertices on levels above as well as in 
levels below £\, is less than N / 2. The problem is that G might contain more than 
0{'/N) vertices. However, there exists a level to above t\ and a level £2 below 
ti with 0(\//V) vertices each, such that £2 ~£o ^ (that is, £q and £2 are not 
too far away from £ 1 ). Levels £q and £2 divide G into three subgraphs Go, Gi and 
G 2 consisting of the vertices on the levels above £q, between £g and £2 and below 
£2 respectively, with the property that Gq and G 2 contain less than N /2 vertices 
and Gi has a spanning tree of bounded height \/N. Refer to Pig. ^(a). It is easy 
to see that in order to find a separator for G it is enough to find a separator 
in Gi P3j- Such a separator can be found using properties of the dual graph 
of G\. The dual graph G* = (R*, E*) of a planar graph G is a planar graph with 
a vertex for each face of G whose edges are in one-to-one correspondence with 
the edges of G. The dual graph G* is obtained by placing a vertex in each face 
of G and connecting two faces fi and fj adjacent to a common edge e = {u,v) 
of G with an edge {fi, fj) in E*. The edge {fi, fj) in G* is called the dual edge 
of {u, v) in G. Let E' C E he a subset of edges in G. It is well known that 
{V, E') is a spanning tree of G if and only if {V* , {E — E')*) is a spanning tree 
in G* [2H. Thus the edges in {E — T)* form a spanning tree in G* which we 
denote T/ An example is shown in Pig. Q(a). If T has bounded height '/N 
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then every edge in {E — T) (and therefore the corresponding edge in {E — T)*) 
determines a cycle in T with at most 2\/~N vertices. Assuming (without loss 
of generality) that G is triangulated, Lipton and Tarjan m proved that there 
exists an edge e G {E — T) such that the number of vertices inside and outside 
the cycle defined by e is < 2N/3, and showed how it can be computed efficiently 
using a bottom-up traversal of the dual tree Tb Hutchinson et al. showed 
how to perform all these operations using 0(sort(iV)) I/Os. 

As discussed in the introduction, the 0(sort(A^)) separator algorithm can 
be used to develop a recursive 0(log ^ • sort(iV)) multi-way separator algorithm 
in a straightforward way. The idea in our new 0(sort(N)) algorithm is to obtain 
0(log^/g recursion depth by increasing the fan-out of the separation from 2 
to ^ and implement each step in 0{^) I/Os. In order to divide the graph in ^ 
subgraphs we use ideas similar to the ones used by Goodrich The general 
idea is the following: Instead of finding only one level cutting the graph in two 
halves, we find (roughly) ^ levels which cut the graph in 0( jj^)-sized chunks. 
We then use these levels to find a set of levels with few vertices which divide 
G into subgraphs such that each subgraph is either of size 0{j^) or has a 
spanning tree of bounded height 0(\bR). We then subdivide the subgraphs with 
bounded height into graphs of size 0{R) using properties of the dual graph. In 
Section ITT^ we show how this can be done I/O-efhciently and prove the following 
lemma: 

Lemma 3. A graph G with a spanning tree T of height H can be divided into 
0(^) subgraphs of size 0{R) each and 0{^H) separator vertices in total using 
0(sort(7V)) I/Os. 

After subdividing the bounded height subgraphs we recursively subdivide the 
subgraphs of size 0{j^). In Section 1^1 we give the details in our algorithm 
and prove the following: 

Theorem 2. Let G = (V, E) be a planar graph and T a breadth-first search tree 
for G. Furthermore assume 3 e > 0 such that M > . For any R = I2{M), 

G can be partitioned into 0{^) subgraphs Gi of size 0{R) each and a set of 
separator vertices S of size 0(sort(A^)) using 0{sort{N)) I/Os. 

3.1 Separating Planar Graphs 

In this section we prove TheoremO using LemmaEl Let L{i) be the total number 
of vertices on levels 0 through z of T and define the starter levels to be the levels 
i such that the interval {L{i),L{i 1)] contains a multiple of |"y], for some 

0 < X < N. There are at most X starter levels and the number of vertices 
between consecutive starter levels is smaller than [ . 

Just like the £i level in Lipton and Tarjan’s algorithm the starter levels 
divide G in subgraphs of “small” size. However, as previously, the starter levels 
can contain too many vertices. Therefore we consider the first level above each 
starter level, as well as the first level below each starter level containing at most 
Y vertices, for some 0 < T < N. We call these levels the cutter levels. The cutter 
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levels divide G into 0{X) subgraphs Gi, consisting of the vertices between two 
consecutive cutter levels, with the property that if the two cutter levels defining 
Gi are within two (consecutive) starter levels then Gi has size 0{^). If the two 
cutters defining Gi are not within two consecutive starter levels then Gi has a 
spanning tree of depth O(^). Refer to Fig. E(b). 





As mentioned, the idea 
in our algorithm is to ap- 
ply Lemma El to the sub- 
graphs of bounded height 
0(y) and recursively sep- 
arate the subgraphs of size 
O(^). By choosing Y = 
each bounded height 
subgraph Gi of size Ni has 
height '/R, and it can thus 



be separated into &{^) 



Fig. 1. (a) Illustration of the planar separator algo- 
rithm 1231 ; (b) Starter and cutter levels in T 

subgraphs of size 0{R) 

and 0{^ ■ VR) = 0{^^) separator vertices using 0(sort(A^i)) I/Os. Note that 
as we are not recursing on Gi (that is, we are not touching Gi again), the total 
cost of separating all such subgraphs over all levels of the recursion adds up to 
0(sort(iV)) in total. The separator vertices are the vertices of the 0{X) cutter 
levels (each cutter level has at most Y = vertices), the separator vertices 
resulting from applying Lemma El to the subgraphs of bounded height and the 
separator vertices resulted from the recursive calls. Thus the total number of 
separator vertices is given by S{N) < X^^ + ^ -|- X • S'(^). If we choose 

X = and assume M > for some £ > 0, it can be shown that 

f = 0{logM/B f )> so that S{N) = 0{sort{N)). 

The only thing remaining to discuss is how to represent a subgraph Gi be- 
tween two cutter levels Ci and c^+i in the format needed in order to apply 
LemmaElor perform the recursive call. Both these steps require that a BFS tree 
is given along with the subgraph. The part of T included in Gi is not connected 
and thus it is not a BFS tree for Gi. However, we can easily produce such a 
tree by introducing a “fake” root Vi and connecting it with “fake” edges to all 
vertices on level c^+i. Note that if T is given level-by-level this can easily be done 
for all the subgraphs in 0{j^) I/Os. The fake vertices and edges are marked so 
that they can be removed at the end of the algorithm. Details will appear in the 
full paper. 



That our algorithm uses 0(sort(fV)) I/Os can be seen as follows. The pre- 
processing step of computing the BFS level for each vertex in T and sorting the 
edges of G by level can easily be performed in 0(sort(fV)) I/Os using standard 
techniques (such as list ranking and Euler tours) P2J- If we do not count the 
I/Os used to separate the subgraphs with bounded height, one recursion step 
can be performed in O(^) I/Os, and the recurrence for the number of I/Os used 
becomes T{N) < ^ -|- A • T(^). Thus T{N) = 0(sort(7V)). As the total number 
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of I/Os used to separate the subgraphs of bounded height is 0(sort(./V)), we 
have shown that our algorithm uses 0{sort{N)) I/Os in total. This concludes 
the proof of Theorem |21 

So far we have only discussed the case R = Q{M). If R is o(M) then we 
can use Theorem |2| to separate G in subgraphs of size 0{M), then load each 
subgraph into main memory one at a time and apply Lipton and Tarjan pla- 
nar separator algorithm until all subgraphs have size 0{R). This results 
in 0{^^) separator vertices. In some applications of the graph separation it 
is necessary to bound not only the total number of separators S, but also the 
number of separator vertices adjacent to any subgraph. This can be done as 
follows: For each subgraph which has adjacent separator vertices mark 

the inner vertices as inactive and apply TheoremEI until the resulting subgraphs 
have (active) vertices. Fredrickson proves that this maintains the 

same bounds for the number of subgraphs and separators given that the graph 
has bounded degree. Details will appear in the full paper. 

Corollary 1. Let G = (V, E) be a planar graph and T a breadth-first search tree 
for G. Furthermore assume 3 e > 0 such that M > Then G can be sepa- 
rated in subgraphs of 0{R) vertices each and a set S of 0{sort{N) -\- -^) 

separator vertices using 0(sort(iV)) I/Os. 

If G has bounded degree then the separation can be constructed such that each 
subgraph Gi is adjacent to 0{^) separator vertices. 

3.2 Separating Planar Graphs of Bounded Height Spanning Tree 

In this section describe how we can separate in 0(sort(iV)) I/Os a planar graph 
G = {V,E) with a spanning tree T of height H into subgraphs of size 

0{R) each and 0{^H) separator vertices. 

Assume for simplicity that G is triangulated. (If this is not the case, we can 
triangulate it using 0(sort(A^)) I/Os m and mark the added edges so that they 
can be removed at the end of the separation. Note that T remains a spanning 
tree after the triangulation). Let G* be the dual of G and let = {E — T)* be 
the spanning tree in G*. The spanning tree can be computed from G and T 
in 0(sort(A^)) I/Os using a face finding algorithm as in and a few sorting 
steps. Each edge in Tl is the dual of an edge e = {u, v) in {E — T) and there 
exists a unique path from u to u in T; this path and e forms a cycle in G, and 
since T has bounded height H, the cycle contains at most 2H — 1 vertices. Thus 
each edge in determines a cycle of size 0{H) in G which separates G into 
the vertices inside the cycle and vertices outside the cycle. Refer to Fig.0 (a). 
It can be shown that if e is the centroid edge of T/ then the number of vertices 
inside and outside the cycle is roughly the same m- 

The main idea in our algorithm is to find O(^) cycles which partition G 
into subgraphs of roughly equal size 0{R). In order to do so, we first discuss 
how to find O(^) edges in such that their removal divides into subtrees 
of roughly equal size 0{R). Then we show that the duals of these edges define 
O(^) cycles in G with the desired properties. 
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Fig. 2. (a) A triangulated graph G (solid lines), T (solid thick lines) and (dot- 
ted lines), (b) The decomposition of into its 10-bridges; square vertices are the 
attachments, (c) Subtree of and the induced cycle in G. 



The decomposition of a tree into independent subtrees of approximately equal 
size was studied by Gazit et al. m in the context of parallel /^-contractions. 
We review briefly their notations and results. Let D = (V, E) be a tree with 
N vertices. The weight W (v) of a vertex u in I? is the number of vertices in 
the subtree rooted at v. A vertex v is called R-critical if v is not a leaf and 
|-w ^1 ^ for all children v' of v. Let C CV. Two edges e and e' of G 

are C -equivalent if there exists a path from e to e' that avoids the vertices C. The 
graphs induced by the equivalence classes of the C-equivalent edges are called the 
bridges of C . The attaehments of a bridge I are the vertices of I that are also in 
C. The R-bridges of a tree D are the bridges of C, where C is the set of //-critical 
vertices of D. An example of the decomposition of a tree into its //-bridges is 
shown in Fig. El (b). Gazit et al. EZI prove the following: (1) The number of 
//-critical vertices in a tree of size N is at most ^ — 1. (2) The number of 
//-bridges in a tree with bounded degree d is at most d{^ — 1). (3) The number 
of vertices of an //-bridge is at most // -b 1. (4) If J is an //-bridge, then I can 
have at most two attachments. 

As the basic step in the computation of the //-bridges of D is the computa- 
tion of the weight of each vertex, it is easy to show how standard I/O-efflcient 
algorithms can be used to compute the //-bridges in 0(sort(/V)) I/Os. If G is a 
triangulated graph, is a binary tree, and thus it has at most ^ //-bridges. 
Each //-bridge defines two cycles in G determined by the two edges incident 
to the two attachments. One of these cycles will be inside the other and there 
are at most // -b 1 faces inside the outer cycle but outside the inner cycle (the 
faces corresponding to the vertices in the //-bridge). Thus the //-bridges of 
determine a separation of G into ^ subgraphs of at most R vertices adjacent 
to 0{^H) separator vertices in total. Given the //-bridges, the decomposition 
of G can be easily computed in 0{soit{N)) I/Os and Lemma El follows. 
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4 Single Source Shortest Paths on Planar Graphs 



In this section we show how to use our graph separation result to obtain an 
efficient SSSP algorithm for planar graphs with bounded degree^ 

Consider separating a planar graph 
G into subgraphs Gi = {Vi, Ei) 





Fig. 3. (a) Separation of a graph into snb- 
graphs (boxed) and separators (black); (b) 
a subgraph in the partition, its boundary 
vertices and boundary sets. 



of 0{R) vertices each and a set S 
of separator vertices, such that each 
subgraph is adjacent to 0{^) sepa- 
rator vertices. We call the separator 
vertices adjacent to Gi the boundary 
vertices of Gi. Our algorithm relies on 
the following observation: Consider a 
shortest path 6{s, t) between two ver- 
tices s and t in G and let {sq, si, ■•■} 
denote its intersection with S. The 
portion of S{s,t) between Si and Si+i 
is completely within some subgraph 

Gj and it must be the shortest path between Si and s^+i within Gj. 

The main idea in our algorithm is to construct a new graph G^ by re- 
placing each subgraph Gi with a complete graph on its boundary vertices. If 
the source vertex s is not a separator vertex, we also include s in G^ and 
connect it to the boundary vertices of the subgraph containing it. The graph 
G^ has S vertices and 0(^ • (^)^) = 0(^^) edges. The weight of an edge 
in GR 

is the length of the shortest path in Gi between the corresponding 
two boundary vertices. If i? = 0{M) these weights can be computed as fol- 
lows: We load each subgraph Gi into main memory together with its boundary 
vertices and use an internal memory all-pair-shortest-paths algorithm to com- 
pute the weights of the new edges between the boundary vertices of Gj, and 
write these edges to the disk. Since each separator vertex is a boundary vertex 
for at most 0(1) subgraphs (because of the bounded degree), we use at most 
(^ -I- S) lyOs to load all the subgraphs and their boundary vertices. As we use 
0(scan(^^^)) I/Os to write the new edges, it follows that G^ can be computed 
in 0{S + sc&n{^-^)) I/Os in total. Using S = 0{sort{N) + (Corollary [Q) 
and choosing R = = iog2 ^ j^/b < is 0(sort(IV)) I/Os. 

Now assume we know how to^'rompute the shortest paths from s to all sep- 
arator vertices in 0(sort(7V)) I/Os. Using the observation mentioned above, we 
know that these paths are identical to the shortest paths in the original graph 
G. We can then compute the shortest paths from s to all the remaining vertices 
in G by loading each subgraph Gj and its boundary vertices in main memory, 
and using an internal memory algorithm to compute the shortest path from s to 
each vertex t in Uj using the formula <5(s, t) = min„{i5(s, v) + 6oi (v, t)}, where v 
ranges over all boundary vertices of Gj. This takes 0(S'-|- scan(fV)) I/Os, so the 
total number of I/Os used is 0(sort(A^)). 

® Note that any graph can be transformed into a graph with each vertex having degree 
at most 3 using a simple transformation 
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All that remains is to show how to solve the SSSP problem on the graph 
with S = 0(sort(A^)) vertices and = 0{N) edges in 0(sort(A^)) I/Os. 

To do so we use a slightly modified version of Dijkstra’s algorithm which avoids 
the use of a decrease-key priority queue operation. We want to avoid such an 
operation since the I/O bound of the best known external data structure with 
this operation is while priority queues with 

bound are known if this operation is not supported m During the algorithm 
we maintain a list L of pairs of vertices of and their distances. Initially all 
distances are oo. We maintain the invariant that the distance of a vertex in 
L is identical to the distances stored in the priority queue controlling the algo- 
rithm. The algorithm repeatedly performs a delete jmin operation on the priority 
queue to obtain the next vertex v to process; then the 0{^) = w/b ) 

edges incident to v are loaded using 0(1) I/Os and the O(^) = at/_b ) 

boundary vertices adjacent to v are determined. These vertices (and their cur- 
rent distances) are loaded from L using n / b ) and, without further 

I/Os we then compute which vertices need to have their distances updated. Fi- 
nally, the new distances are written back to L and the corresponding updates 
are performed on the priority queue. Note that as we know the current distance 
of a vertex which needs to have its distance updated, we can perform the update 
in I/Os using a delete and an insert operation. 

Our algorithm performs 0{N) operations on the priority queue using 
0(sort(7V)) I/Os in total. It also uses 0{S) = 0(sort(fV)) I/Os in total to load 
the neighbors of each vertex. Thus the I/O use is dominated by the O( jog^^^ n / b ') 
I/Os used for each vertex to load its adjacent vertices from L. Since there are 
0(sort(7V)) vertices, this sums up to ^( log^^^a n / b ) ' 0(sort(fV)) = 0{N) I/Os 
in total. 

In order to improve the I/O bound to 0{sort{N)) we modify the algorithm, 
taking into account that there is some implicit adjacency between the boundary 
vertices. Let a boundary set be a maximal subset of boundary vertices such that 
all boundary vertices in the subset are adjacent to exactly the same subgraphs. 
An example is shown in Fig. El(b). Fredrickson irz! showed that the number of 
boundary sets is equal to the number of subgraphs O(^). We therefore modify 
our algorithm such that the vertices in the same boundary sets are stored con- 
secutively in L. Otherwise the algorithm remains unmodified. When a vertex v is 
processed, the relevant boundary sets are determined and loaded from L as be- 
fore. However, now we can think of the accesses as involving full boundary sets, 
as opposed to boundary vertices. Each boundary set is accessed 0( ^^^ ^ m ) 
times (once by each of its adjacent boundary vertices), and as there are 
O(^) boundary sets we use 0(-. — jv • ^) = 0(sort(A^)) I/Os in total. 

^ i°Sm/b "b ^ 

Theorem 3. Let G be a bounded degree planar graph and T a BFS tree for G. 
Furthermore assume 3 £ > 0 sueh that M > . The SSSP problem on G ean 

be solved in 0(sort(fV)) I/Os. 
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I/O- Space Trade-Offs 

(Extended Abstract) 
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Abstract. We define external memory (or I/O) models which capture 
space complexity and develop a general technique for deriving I/O-space 
trade-offs in these models from internal memory model time-space trade- 
offs. Using this technique we show strong I/O-space product lower bounds 
for Sorting and Element Distinctness. We also develop new space 
efficient external memory Sorting algorithms. 

1 Introduction 

In internal memory models the time and space complexity, as well as trade-offs 
between the two, are well studied for fundamental problems such as Sorting 
and Element Distinctness. For example, Pagter and Rauhe m recently 
proved an O(N^) upper bound on the time-space product for Sorting N ele- 
ments, matching a lower bound of Beame 0. Their algorithm can be used to 
improve space usage of time-optimal internal memory Sorting by a factor of 
0(log^ N) compared to classical algorithms like MergeSort and HeapSort. 
Such an improvement would be of considerable practical interest when dealing 
with massive data sets residing on external storage devices such as disks. For 
example, if dealing with 50GB of data even a factor of log N would amount to a 
space reduction of a factor of more than 30. Unfortunately, very little is known 
about space complexity in external memory models where the main complexity 
measure is the number of I/Os needed to solve a problem. For example, even 
though several I/O-optimal external Sorting algorithms have been developed, 
no algorithm using sub-linear disk space — not counting the (read only) space 
holding the input — is known. One reason for this is that no external memory 
model capturing sub-linear space complexity has been defined. 

In this paper we define external memory models which capture space com- 
plexity and use them to study I/O-space trade-offs for fundamental problems 
such as Sorting and Element Distinctness. 
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1.1 Related Work 



The study of internal memory time-space trade-offs attempts to give formulae 
that relate time T and space S for a given problem. A typical result is of the 
form T ■ S = 0{f{N)) for some problem and function /. Time-space trade-offs 
for Sorting are well studied and for time above (roughly) 

A^log 2 N the exact complexity has been established to be T • S' = (9(iV^) 

This means that if S = l7(iV/log2 N) then we can sort in time 0{N log 2 N), and 
this is the best possible. Similarly, if T = then S = is possible 

and required. Time-space trade-offs for Element Distinctness have also been 
studied extensively mm- 

The standard model for studying external memory algorithms (or I/O-algo- 
rithms) is the I/O-model of Aggarwal and Vitter P|. In this model the internal 
memory of size M is divided into m = M/B blocks of size B each. The external 
memory is also divided into blocks of size B, and initially the N input data 
elements reside in the first n = N/B blocks of external memory. An I/O is the 
movement of one block of elements to or from external memory, and the goal is 
to design algorithms that use as few I/Os as possible under the constraint that 
computation can only be performed on elements in internal memory. 

During the last decade, a large number of I/O-efficient algorithms have been 
developed in the I/O-model — see e.g. recent surveys |S|23]- For example, it is 
well known that (under some restrictions) Sorting requires Q{n\og^n) I/Os 
and several O(nlog^n) algorithms using 0{n) extra space (disk blocks) have 
been developed. However, no results are known about I/O-space trade-offs for 
Sorting. Similarly for Element Distinctness, Arge et al. ^ showed that 
the problem is as hard as Sorting in a comparison based model, and Arge 
and Miltersen 0 gave an 0{n) I/O (and space) randomized algorithm for the 
problem, but nothing is known with respect to I/O-space trade-offs. 

It should be mentioned that one reason no I/O-space trade-offs are known 
in the I/O-model is that it allows for the input to be overwritten. Thus fl{n) 
space is always available for algorithms in the model and one cannot formally 
express sub-linear space bounds. External memory models that capture space 
complexity have been introduced in the area of straight-line computation mod- 
els (i.e., models where branching or “if-then-else” statements are not allowed). 
An example is the so-called red-blue pebble games — see e.g. mini- However, 
disallowing branching is too severe a restriction for our purposes. 



1.2 Our Results 

In Section El we introduce computational models which allow us to study I/O- 
space trade-offs. We first introduce an extension of the Aggarwal and Vitter 
model (or rather an I/O- version of the RAM-model equivalent to their model) in 
which sub- linear space complexity can be expressed. This is the model we will use 
when developing space-efficient I/O-algorithms. We next extend the branching 
program model — the model most commonly used for showing internal 

memory time-space trade-offs — in order to capture I/O and space complexity 
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simultaneously. We also prove that lower bounds in this model are valid in the 
extended I/O-model. 

In Sectional we develop a technique for obtaining I/O-space trade-offs for 
external memory computation from time-space trade-offs for internal memory 
computation. More precisely, let T and S denote the time and space usage of an 
internal memory algorithm solving a problem P over U = i?}. Similarly, 

let denote the number of I/Os and the space usage of an external 
memory algorithm solving P. We prove that if T = n{f{N,S)), then = 

Using the general result we prove I/O-space trade-offs for Sorting and El- 
ement Distinctness. Combining the result with a result of Beame 0, we for 
example show that for Sorting the I/O-space product is Q{N'^ / B) — Q{N ■ n). 
Using the internal memory algorithm of Pagter and Rauhe m in external mem- 
ory shows that this bound is tight among algorithms using more than (roughly) 
A^log 2 N I/Os. This is an interesting result, as it suggests that when disk space 
is restricted, traditional internal memory approaches can lead to optimal exter- 
nal memory algorithms. The results for Element Distinctness are obtained 
by applying our results to the internal memory lower bounds of Ajtai |3] and 
Yao 1^ . 

Finally in Section 0 we discuss an external memory generalization of the 
algorithm of Pagter and Rauhe which for certain choices of M and B obtains 
the optimal I/O-space trade-off for Sorting down to the optimal number of I/Os 
0(n log^ n). In general however, we can only prove an 0{{N'^ / {B + m)) log 2 m) 
upper bound on the I/O-space product. We conjecture that our lower bound 
for Sorting is tight for all values of M and B, that is, an algorithm achieving 
0{N'^/B) exists. 



2 Models of Computation 

In this section we introduce computational models which allows us to study 
I/O-space trade-offs. In Section 12.11 we first consider upper bound models and 
in SectionISIwe then consider lower bound models. We discuss the relationship 
between the models in Section roi 

One main difference between our models and the standard I/O-model is 
that we assume input to be read-only. A natural question is whether this is 
reasonable and we claim that it is. Consider for example the task of Sorting a 
huge database by a secondary key: In such an example it might be important not 
to overwrite the original database sorted by primary key. A typical example is the 
customer database of a bank, which will normally be sorted by account numbers. 
Occasionally the bank might want a phone book over its customers, requiring 
the database to be sorted on customer names, but rarely will it be interested in 
erasing the original database used for all standard business transactions. Other 
examples occur when the input is stored on a medium which is physically read- 
only, for example on a CD-ROM. 
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2.1 Upper Bound Models 

In this section we define two external memory extensions of the unit cost RAM 
model. The unit cost RAM model is a popular internal memory model for show- 
ing upper bounds. In this model input consists of N words xi, ... ,xn from some 
universe U = {0, 1}*", and it is a normal convention that each word in memory 
can hold exactly w bits, corresponding to one input element. We use this con- 
vention and define a parameter i? = 2“ (i.e. each input element can represent 
one of R values) . We will always assume that R> N. 

Definition 1 (External RAM) The external RAM consist of two parts: A 
memory layout, and an instruction set: 

Memory layout: Input is located on a read-only input medium consisting of 
N words of log 2 R bits, grouped into n = N/B consecutive blocks of B words. 
Output is to be written to a separate write-only output medium. Furthermore, we 
have an external memory consisting of blocks of B words o/log 2 R bits each, and 
an internal memory consisting of M/B blocks of B words o/log 2 i? bits each. 
Instructions: The algorithm can execute the following instructions: 

1 Read a block from the input into internal memory 

2 Write a block from internal memory to the output 

3 Swap a block from internal memory with one from external memory (both 
may be “empty”) 

4 Perform some unit-cost operation on two words in internal memory, writing 
the result to one word in internal memory. 

□ 

The number of I/Os T^f’ performed by an external RAM is the number of 
times the algorithm executes instructions 1, 2 or 3, and we define space Sjf to 
be the number of bits occupied by the external memory. Note that we ignore 
the M log 2 R bits available in internal memory since we will always assume that 
S'JjP > M log 2 R. Also note that we define space in terms of bits and not words 
or blocks. One can of course easily translate our space measure into measures 
based on words (S'/jP/log 2 i?) or blocks (S'/P/(i?log 2 R)). In the introduction 
the space bounds were expressed in terms of blocks. 

It should be clear that the external RAM model is essentially a straightfor- 
ward extension of the I/O-model of Aggarwal and Vitter — basically the input 
has just been made read-only. 

Definition 2 (Comparison external RAM) A comparison external RAM is 
an external RAM, with instruction 4 replaced by 

4a Compare input elements, or copies thereof, in two words in internal memory 
using a binary comparison 

4b Perform some unit-cost operation on two words in internal memory which 
are not occupied by input elements, or copies thereof, writing the result to 
one word in internal memory 

□ 
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The number of I/Os T^P performed by a comparison external RAM is the 
number of times the algorithm executes instructions 1, 2 or 3, and the space use 
SjP is the number of bits occupied by the external memory. 

Note that in a comparison external RAM one can only access words con- 
taining input elements, or copies thereof, through comparisons. All other words 
in internal memory can be manipulated freely. We think of each word contain- 
ing (a copy of) an input element as being marked, and such marked words can 
only be accessed via comparisons with another marked word. This somewhat 
strange way of enforcing a comparison model is a result of the fact that in an 
external memory model it is vital that we are allowed to make and move copies 
of elements (since block movement is the main complexity measure). Another 
standard way of enforcing a comparison model is to ensure that elements are in- 
divisible and that new elements cannot be produced, that is, that each word in 
memory is either empty or contains an input element (the so-called indivisibility 
assumption m)- However, there is evidence that the indivisibility assumption 
drastically increases the complexity of certain problems and furthermore, 
we would like to be able to manipulate objects such as pointers in our algorithms. 

It should be clear that the external RAM models can easily simulate algorithms 
constructed for models where one is allowed to overwrite the input : Simply make 
a copy of the input in the external memory and run the algorithm on this copy. 
In particular, the I/O-optimal 0(n log^ n) Sorting algorithms of Aggarwal and 
Vitter 0 may be implemented on the comparison external RAM. Note also that 
the external RAM is at least as strong as the comparison external RAM. 



2.2 Lower Bound Models 



The models we will use for showing lower bounds (on 1/ 0-space trade-offs) are 
extensions of the branching program model. The branching program model is 
a well established internal memory model for showing lower bounds on time- 
space complexity, see e.g. |3I9I1()I11I12I15I17I24) . Branching programs come in 
two main variants: Comparison based branching programs, which were initially 
studied in |22| according to which they were introduced by Pippenger, and i?-way 
branching programs introduced by Borodin and Cook uni. Detailed discussions 
of branching programs can be found in HMD. 

A comparison branching program is a directed acyclic graph (DAG) with one 
root. Each non-leaf node is labeled {i : j) and has two outgoing arcs labeled Xi < 
Xj and Xi > Xj. A computation starts at the root and proceeds in the natural 
way until a leaf is reached. If one is studying decision problems, each leaf will 
be labeled 0 or 1 depending on whether the corresponding computation rejects 
or accepts the input. For functions such as Sorting, each arc may be further 
labeled with elements from the output domain. The output of the computation 
is the ordered concatenation of the outputs encountered along the computation 
path. 

Time Tc in the comparison branching program model is defined as the height 
of the branching program, corresponding to the number of comparisons per- 
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formed in the worst case, and space Sc is defined as log 2 1^1, where V is the set 
of vertices of the branching program. This is an adequate space measure since 
it gives a lower bound on the number of bits required to distinguish between 
the states of the program. We will discuss this in greater detail when comparing 
RAM models and branching programs in Section r2.3l 

An R-way branching program is a DAG with one root, where each non-leaf 
(branching) node is labeled with an index i (1 < i < N) and has an outgoing arcs 
for each element of U = {0, 1}’" (recall that R = 2’"). A computation proceeds 
as before except that in a node labeled i the arc labeled I is followed if Xi = 1; 
i.e., a branch is made according to the value of Xi. 

As previously, Tr is defined as the height of the DAG, corresponding to the 
number of times we read one of the elements from the input in the worst case. 
Space Sr is defined as for comparison branching programs. As we can simulate 
a comparison using two R-way branches, R-way branching programs are asymp- 
totically stronger than comparison branching programs (for R = From a 

practical point of view, the R-way branching program model is an unrealistically 
strong model of computation; given enough space to remember the value of all 
the input elements — 0{R^) nodes or space 0{N log 2 R) — one can decide any 
problem after reading each element once, i.e., in linear time. On the other hand, 
when restricting the space one can prove interesting and sometimes even tight 
time-space trade-offs in the model. In the following we define external versions 
of the branching program models. 

Definition 3 (Comparison external branching program) A comparison 
external branching program is a comparison branching program with two types of 
nodes — comparison nodes and I/O-nodes: An I/O-node replaees any B elements 
in the internal memory of size M with any B elements from the input. A com- 
parison node can only compare two elements which are both in internal memory. 
□ 

The number of I/Os T^P performed by a comparison external branching 
program is defined as the maximum number of I/O-nodes encountered along 
any root-leaf path. As previously, space S^ is defined as log 2 \ V\, where V is 
the set of vertices of the branching program. 

We emphasize the fact that one can read any B elements when making a 
block transfer (I/O) from input to internal memory. This seems like a strong 
and unrealistically powerful operation, but not only can we prove interesting — 
in some cases even tight — lower bounds in this model, we also (as we will see) 
need this strength in order to simulate comparison external RAM algorithms 
(Theorem ^ . Note also that the term “internal memory” is somewhat mislead- 
ing, as the M elements are not physically present in any kind of memory. At 
any point of the computation, the elements “in internal memory” are just the 
M input elements (out of the N possible elements) the external branching pro- 
gram is allowed to compare. Finally, note that comparison external branching 
programs cannot make copies of elements, since they can only access input ele- 
ments through comparisons. 
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Definition 4 (i?-way external branching programs) An R-way external 
branching program is an R-way branching program with two types of nodes — 
branching-nodes and I/O-nodes: An I/O-node replaces any B elements in the 
internal memory of size M with any B elements from the input. A branching- 
node performs a branch based on the value of one of the M input elements in 
internal memory. □ 

The number of I/Os performed by an R-way external memory branch- 
ing program, as well as the space use Sjf, is defined as for the comparison 
external branching program. For both R-way external branching programs and 
comparison external branching programs we again use the natural assumption 
that > M log 2 R. 

We call a (comparison or R-way) external branching program where all nodes 
have in-degree at most 1 an external tree. Arge et al. jOj defined a model similar 
to a comparison external tree called the I/O decision-tree, and Munagala and 
Ranade m defined a model similar to an R-way external tree. In both models 
one is only allowed to read contiguous input elements. However, unlike external 
trees, both models contain a mechanism for rearranging (writing) the input ele- 
ments. 

A key property of (comparison or R-way) external branching programs is that 
removing the I/O-nodes results in a standard branching program. In this sub- 
section we will discuss two other properties of external branching programs. 

A branching program is called leveled if the nodes of the program can be 
partitioned into T classes V\,V2, ■ . ■ ,Vt such that arcs emanating from nodes in 
Vi go to nodes in Vi+\. Pippenger proved that any standard branching programs 
of height T using space S can be transformed into a leveled branching program 
with height T -|- 1 and using space less than 2S — see e.g. Borodin et al. H2|. 
We say that an I/O-node w is the immediate I/O-successor of a node v, if w 
is the first I/O-node encountered on one of the paths from u to a leaf of the 
branching program. Note that v may have several immediate 1/ 0-successors. 
We say that an external branching program is I/O-leveled, if the I/O-nodes 
can be partitioned into classes Vi, V 2 , • ■ • , V^io such that all I/O-nodes in 
Vi have their immediate 1/ 0-successors in k/+i. With a simple modification of 
Pippenger’s proof we can show the following. 

Lemma 1 Any (comparison or R-way) external branching program using 
I/Os and space can be transformed into an I/O-leveled external branching 
program solving the same problem using -\- 1 I/Os and space less than . 

The immediate I/O-ancestor of a node v can be defined analogously to im- 
mediate I/O-successor. An external branching program is called I/O-separated 
if all internal nodes other than I/O-nodes have only one immediate I/O-ancestor. 

Lemma 2 Any I/O-leveled (comparison or R-way) external branching program 
using I/Os and space can be transformed into an I/O-separated and 
I/O-leveled external branching program solving the same problem using I/Os 
and less than 2S^^ space. 
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Proof: The details of the proof appear in the full version of this paper jS| . □ 

In the following, we will assume without loss of generality that all external 
branching programs are I/O-leveled and I/O-separated. 



2.3 External RAM’s vs. External Branching Programs 

In this section we prove two theorems which allow us to prove lower bounds for 
external RAM’s using the external branching program models. 

Theorem 1 For any comparison external RAM algorithm solving a problem 
P in T^P I/Os and space S^P > M log 2 R there exists a comparison external 
branching program solving P in I/Os and space at most 2S^ . 

Proof: The basic idea of the proof is to construct a branching program with 
a node for each state of the RAM algorithm and with arcs reflecting how the 
computation proceeds. However, there seems to be one major problem with this 
idea, namely that while the comparison external RAM can make copies of input 
elements and move them around in memory, a comparison external branching 
program can only compare input elements. However, as we will show below, this 
problem can be overcome using the comparison external branching programs 
(powerful) ability to move any B input elements into internal memory instead 
of just contiguous ones. We also use the fact that comparison external RAM’s 
can only access (copies of) input elements using comparisons. 

As the comparison external RAM algorithm uses Sq-' bits in external memory 
and have M log 2 R bits in internal memory, is has at most i°S 2 R < 2‘^^c 

distinct states. We split these states into three types based on the operation 
performed in a given state: 

1. I/Os (instructions 1, 2, and 3). 

2. Comparisons of (copies of) input elements (instruction 4a). 

3. Other operations (on non-input elements) (instruction 4b). 

After performing an operation in a given state the comparison external RAM 
algorithm proceed to a new state: In a type I state the algorithm performs an 
I/O and proceeds to one unique state. In a type 3 state the algorithm performs 
some operation and proceeds to one unique state. Note that a sequence of type 
3 states are “straight-line” in the sense that the computation performed in the 
sequence only depends on the first state in the sequence. In a type 2 state the 
algorithm proceeds to one out of two unique states, depending on the outcome 
of the comparison. 

We construct a comparison external branching program from the comparison 
external RAM as follows: For each state in the comparison external RAM we 
construct a node in the comparison external branching program, and we con- 
nect these nodes according to how computation proceeds from state to state. A 
comparison external branching program cannot contain nodes corresponding to 
states of type 3, but as these perform straight-line computation we can remove 
all such nodes/states by coalescing a node/state (or a sequence of them) into 
their first successor of type 1 or 2. Nodes corresponding to type 2 states can be 
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left unmodified, but nodes corresponding to type 1 states need to be modified 
since I/O-nodes in a comparison external branching program always read B in- 
put elements from input to internal memory, while much more complicated I/Os 
can be performed in type 1 states. Nodes corresponding to type 1 states which 
only read from the input we just leave unchanged. Nodes corresponding to type 1 
states which write to output are removed and the arc between its predecessor and 
successor is labeled with the appropriate output information. Nodes correspond- 
ing to type 1 states which swap a block in internal memory with a block from 
external memory are more complicated since we can only read input elements 
in comparison external branching programs. Consider the block of B elements 
swapped into internal memory in such a node/state. Some of these elements are 
copies of the original input elements and some contain other information. The 
latter can be disregarded since the same information is represented in the state 
itself (recall that a state represent the content of the entire memory) . We replace 
the node with a node which read the relevant less than B elements from the in- 
put. In general these input elements will not constitute a block of the input but 
we utilize that external branching programs can read any B input elements in 
one swap operation. 

It should be clear that the comparison external branching program con- 
structed in this way solves the problem P in Tq^ I/Os. The branching program 
uses less than log 2 2'^^cF = 2S[P space. □ 

We can prove a similar theorem for external RAM algorithms and i?-way 
external branching programs. We omit the proof as it is very similar to the 
proof of Theorem Q1 

Theorem 2 For any external RAM algorithm soluing a problem P in I/Os 
and space > Mlog 2 R there exists a R-way external branching program 
solving P in I/Os and space at most 2Sjf. 

3 Lower Bounds 

In this section we first present a general method for obtaining I/O-space trade- 
offs in external branching program models from time-space trade-offs in normal 
branching program models. Using this method we then obtain trade-offs for 
Sorting and Element Distinctness. 

3.1 General Lower Bound Method 

In |f| Arge et al. describes a general technique for transforming an internal 
memory decision tree lower bound into an I/O-decision tree I/O lower bound 
(see also m). The main idea in their technique is a method for transforming 
an I/O-decision tree algorithm into an internal decision tree algorithm, such that 
the number of comparisons performed by the internal algorithm is bounded by 
a function of the number of I/Os performed by the external algorithm. In this 
section we apply the same idea to branching program models. We first consider 
the comparison model. 
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Theorem 3 Suppose that for any comparison branching program solving a prob- 
lem P in time Tq and space Sq we have Tq > f{N, Sq)- Then for any compar- 
ison external branching program solving P 



Proof: As mentioned, the proof is based on the ideas used by Arge et al. we 
will describe a general method for transforming a comparison external branching 
program solving P using I/Os and space into a comparison branching 
program solving P. The comparison branching program lower bound then results 
in a comparison external branching program lower bound. 

Consider an I/O-leveled and I/O-separated comparison external branching 
program, and as previously let the sub-branching program rooted at an I/O-node 
V consist of all nodes reachable from v without passing through another I/O- 
node. Recall that as the comparison external branching program is I/O-leveled, 
each comparison-node is contained in precisely one sub-branching program. Our 
aim is to replace each sub-branching program with a new sub-branching program 
that computes the total ordering of all the elements in internal memory; since we 
only have comparison based access to the input, computing the total order means 
that any question regarding the input answered by the old sub-branching pro- 
gram can be answered by the new sub-branching program. We first transform 
the comparison external branching program into another comparison external 
branching program where we know the total order of the input elements in in- 
ternal memory in each I/O-node. (Note that there are M! different orderings of 
the M elements in internal memory). In order to do so we first make Ml copies of 
the original comparison external branching program — in our construction each 
of these copies will be used to represent a unique total order of the elements in in- 
ternal memory. Each copy of the comparison external branching program is now 
transformed into another comparison external branching program I/O-level by 
I/O-level, top-down (while maintaining the invariant that the internal memory 
element order is know in 1/ 0-nodes on already processed levels). We transform 
the sub-branching program rooted in I/O-node v as follows: We replace the 
sub-branching program with a sub-branching program which first computes the 
total order of the B input elements loaded into internal memory by v (i.e. it sorts 
them), and then finds the positions of the B elements among the sorted elements 
already in internal memory (i.e. it merges the B “new” elements with the M — B 
“old” elements). As shown in p], the height of the new sub-branching program 
can be bounded by 0{B log 2 B-\-B log 2 m) = 0{B log 2 M), which in turn means 
that it contains nodes. Each leaf In of the new sub-branching pro- 

gram corresponds to a total order of the elements in internal memory. Thus 
it also corresponds to a unique leaf Ig in the original sub-branching program, 
namely the leaf at which inputs with this particular total ordering would end 
up. To guarantee that the transformed comparison external branching program 
solves P, we want to connect In to the same I/O-node Vq on the next I/O-level 
as lo is connected to. Note however, that several leaves (total orders) in the new 
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sub-branching program could correspond to lo- In order not to lose information 
about the order of the elements in internal memory, we therefore connect to 
Vo of the copy of the original program corresponding to the total order of the 
elements in internal memory. After processing all 1/ 0-nodes on the same level 
as V, we go on and process the next I/O-level. 

After the transformation we have a new comparison external branching pro- 
gram with the same overall structure as the original comparison external branch- 
ing program, specifically the new branching program has at least the same knowl- 
edge about the input as the original branching program. Consequently the new 
branching program can answer any question that the original branching program 
answers and thus it solves problem P. 

From the construction it should also be clear that the new comparison ex- 
ternal branching program has the same I/O-height T^P as the original com- 
parison external branching program. That the space use of the new compar- 
ison external branching program is 0{S^) can be seen as follows: We have 
Ml copies of the original program, each containing no more than |y| = 2^’c' 
I/O-nodes. For each I/O-node we have a sub-branching program of size less 
than Thus in total the new comparison external branching program 

use space 0(log2(M! • 2 ^c° . 2 Siog 2 M^ ^ 0{M\og^M + S^P + Blog^M) = 

0{M\og^R + S^O) = 

Next we simply remove the I/O-nodes from the transformed comparison ex- 
ternal branching program and obtain a (standard) comparison branching pro- 
gram with height Tq = 0{TlP ■ Blog 2 M) using space Sc = 0{SjP). The 
theorem follows since Tc > f{N, Sc)- n 

We can prove a similar theorem for i?-way external branching programs. 

Theorem 4 Suppose that for any R-way branching program solving a problem P 
in time Tr and space Sr we have Tr > f{N, Sr). Then for any R-way external 
branching programs solving P 



Tr° = ^ 




Proof (sketch): We use the same proof technique (construction) as in the 
comparison model (Theorem EJ . The B log 2 M factor in the comparison model 
construction was a result of this being the height of the sub-branching program 
used to obtain full information about the elements in internal memory after an 
I/O. We will use the power of the R-way model to decrease this height and 
thus obtain an improved bound. In the R-way model, having full information 
about the elements in internal memory corresponds to knowing the value of all 
the elements. Thus the main idea in the R-way construction is to replace each 
sub-branching program with a sub-branching program which ’’remembers” the 
value of the B new elements in internal memory after an I/O. Since such a sub- 
branching program has height B (we just read each element once) the theorem 
follows. Details appear in the full version of this paper [B|. □ 
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Note the interesting and somewhat counterintuitive fact that by switching 
to a stronger model (from Theorem 0 to Theorem 0 we obtain a better lower 
bound. 

3.2 Applications of the Lower Bound Method 

In the R-way model, Beame 0 proved that Tr • Sr = for the problem of 

Sorting elements in the universe U = {1, . . . , A^}. Using Theorem 0 we thus 
obtain the following. 

Corollary 1 Any R-way external memory branching program Sorting N num- 
bers from a universe U = {1, , A^} has ■ Sjf = 17 (^). 

Ajtai PI proved that for Element Distinctness over U = {1, . . . , A^} we 
have that if Tr = 0{N) then Sr = Q{N). Using Theorem 2| we obtain the 
following. 

Corollary 2 Any R-way external memory branching program solving the Ele- 
ment Distinctness problem on A numbers from the universe U = {1, . . . , A^} 
with Tj^ = 0{n) uses space Sje = f2{N)- 

Corollary Q] implies (among other things) that we must use 17(A/(log^ n)) 
space in order to sort A elements in the optimal 0(n log^ n) I/Os. This should 
be compared to the A log 2 R space use of the sorting algorithm of Aggarwal and 
Vitter P). Corollary 0 means that in order to decide Element Distinctness 
in a linear number of I/Os we must use at least linear extra space. 

Yao |21| proved that any comparison branching program deciding Element 
Distinctness on A numbers must have Tc ■ Sc = I 7 (A^“'^A))^ where e(A) = 
5/y^log2 A. Using Theorem 01 we thus obtain the following. 

Corollary 3 Any comparison external branching program deciding Element 
Distinctness on A numbers has T^P ■ S[P = 17 ( r ^ ) ■ 

4 Upper Bounds 

In this section we briefly discuss our new upper bounds on the I/O-space product 
for Sorting. Details appear in the full version of this paper [B|. 

Modifying the internal memory Sorting algorithm of Pagter and Rauhe ism 
in a straightforward way to make it work in external memory we can obtain the 
following. 

Theorem 5 There exists positive constants ci and C 2 so that there exists a com- 
parison external RAM algorithm Sorting A numbers in ciAlog 2 A < T^P < 
C 2 A^/(Rlog 2 A) I/Os and space S[P such that T^P ■ S[P = O(^). 

It follows from Corollary [H that Theorem 0is optimal. Note that this means 
that traditional internal memory approaches can lead to optimal external mem- 
ory algorithms when disk space is restricted. 

In the more important case where n log„ n < T^P < A log 2 A (roughly) we 
can sometimes also obtain optimal bounds. In order to do so we need to make 
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several modifications to the algorithm by Pagter and Rauhe. Their algorithm 
is based on a complicated binary tree data structure similar to a tournament 
tree (see e.g m)- By designing a m-ary version of their data structure we first 
reduce T^P to 0 {N‘^ / {kB) + A^log 2 m ■ log^ k) using space 0{klog2 m) — i.e. we 
obtain a data structure where we may vary I/O and space usage by varying 
the parameter k. To improve this bound we utilize a variant of the buffering 
technique of Arge as well as the fact that one can sort M elements in internal 
memory without performing any I/Os. Using these ideas we can further reduce 
T^Ofo 



O 



( 






Nlog^k 



B 



increasing the space use with a factor B to log 2 m). By choosing k appro- 

priately we then obtain the following. 

Theorem 6 There exists positive eonstants c\ and C2 so that there exists an 
eomparison external RAM algorithm Sorting N numbers in cinlog^ n< T^P < 
C2fVlog2 N I/Os and space S^P such that T/P ■ S^P = 

Note that Theorem El means that if M > (a realistic and standard as- 
sumption in the external memory literature) we can sort I/O-optimally (using 
O(nlog^n) I/Os) using (sub-optimal) space 0((A^log2 m)/ log^ n). In compar- 
ison, the Sorting algorithm of Aggarwal and Vitter | 2 | uses roughly a factor 
log 2 n more space 0(IVlog2 N). If also log 2 m < log^ n we achieve optimal space. 
We conjecture that our lower bound for Sorting is tight for all values of M and 
B, that is, an algorithm achieving a I/O-space product of 0 {N‘^/B) exists. 
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Abstract. Current IP routers are stateless: they forward individual 
packets based on the destination address contained in the packet header, 
but maintain no information about the application or flow to which a 
packet belongs. This stateless service model works well for best effort 
datagram delivery, but is grossly inadequate for applications that require 
quality of service guarantees, such as audio, video, or multimedia. Main- 
taining state for each flow is expensive because the number of concurrent 
flows at a router can be in the hundreds of thousands. Thus, stateful so- 
lutions such as Intserv (integrated services) have not been adopted for 
their lack of scalability. Motivated by this dilemma, we formulate and 
solve the flow aggregation problem, where we give an efficient algorithm 
for computing the smallest set of aggregated flows that encode the for- 
warding state of individual flows. Such aggregation of state information 
might increase the viability of Intserv- type protocols. 



1 Introduction 

Current IP networks provide one simple service: the best effort packet delivery, 
in which no guarantee is made about when or if a packet will be delivered. This 
simple model allows IP routers to be stateless: a router does not need to know 
anything about the potentially large number of individual connections passing 
through it; it simply forwards each IP packet based on the destination address 
contained in the packet header. The routing table entries are highly aggregated — 
a single entry like 10100* provides the next hop information for all destinations 
that start with prefix 10100. When multiple entries match a packet’s destination, 
the router uses the longest matching prefix rule to forward the packet mm- 
The best-effort service model works well when there is no congestion in the 
network and the end applications are relatively insensitive to delay (such as file 
transfer). In reality, parts of the network are frequently and heavily congested, 
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and a large number of emerging applications are real-time, meaning they are ex- 
tremely sensitive to delay, such as audio, video, or IP telephony. During network 
congestion, a router needs to give priority to real-time traffic over non-real-time 
traffic, and thus adopt a “differentiated services” model. Such a differentiated 
services model is also attractive to ISPs (Internet Service Providers), who need 
better traffic management so they can offer different quality services to different 
customers at different prices. 

A differentiated service model can be implemented by maintaining per-flow 
state in the routers, as proposed by protocols like RSVP in the Intserv model |2|, 
m- Stateful routers can provide more powerful and flexible services such as 
bandwidth allocation, end-to-end latency bounds, protecting well-behaving flows 
from misbehaving ones, and end-to-end congestion control 0. Unfortunately, 
maintaining per-flow state in routers can be prohibitively expensive because the 
number of flows can be in the hundreds of thousands. Therefore stateful routers 
do not scale to large sizes as well as the stateless routers. In this paper, we 
formulate and solve a problem, called flow aggregation, which we hope can make 
stateful routers more scalable. Before we describe flow aggregation, let us briefly 
explain how current stateless routers forward packets. 

In the IP address scheme, each network is assigned a network address, and 
each host in that network uses the network address as its prefix. (The host ad- 
dress is fixed length, 32 bits, while the network addresses are variable length pre- 
fixes.) Each router maintains a routing table, containing a set of address prefixes; 
associated with each prefix is a “next hop” label. Thus, an entry (10100*, A) 
says that a packet whose destination address starts with 10100 should be for- 
warded to router A; the router A will forward the packet closer to the packet’s 
ultimate destination. (The symbol ‘*’ is the wildcard character.) 

Routing table entries are highly aggregated — while there are millions of IP 
hosts, the largest backbone routers have about 50 thousand prefixes El. This 
aggregation has several advantages — smaller table size reduces table memory, 
improves search time, and it also reduces the routing update traffic. The aggre- 
gation does have a cost — to look up a packet’s next hop, we need to find the 
longest prefix matching the header, which is a more complicated operation than 
a simple index into a table. For instance, suppose that a router has three prefixes 
0*,010*, and 0101*, with corresponding next hops A,B and C. Then, a packet 
with destination address 01011 matches all three but is sent to C, the longest of 
the three matches. On the other hand, a packet with address 01101 is sent to A. 

1.1 Flow-Based Routing 

The simple stateless routing, which works well when the network has sufficient 
capacity and no congestion, is grossly inadequate for real-time applications, such 
as audio or video, that have stringest delay requirements. Stateful routers can 
implement more sophisticated routing and packet scheduling by using not just 
the destination address but additional packet header fields and by maintain- 
ing information about flows and applications. For instance, an ISP can provide 
guaranteed quality of service to a company by routing all traffic between two 
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company sites along a high bandwidth channel, which requires the routers in the 
ISP network to maintain state information over network address pairs (src, dest). 
In this paper, we will use (src, dest) pairs to illustrate ideas, though all the results 
carry over for any header field pair. 

A flow is defined as a pair (src, dest), where src and dest are network address 
prefixes, each at most w bits long; in IP version 4, these addresses are at most 
32 bits. We define a flow routing entry to be a tuple ((src, dest), action), where 
action is the routing action associated with the flow (src, dest) . The routing 
action typically is the address of the next hop router to which the packet should 
be sent, but its exact semantics is irrelevant to our abstract framework; in some 
applications, the action could also take the form of “do not forward the packet” 
which is useful for access control m- 

We say that a flow routing entry (src, dest) matches a packet P if src is a pre- 
fix of the packet’s source address, and dest is a prefix of the packet’s destination 
address. Thus, a packet with header (0011, 1100) matches the flow (00*, 1*), but 
not the flow (00*, 10*). Let T> denote a table of N flow routing entries. Given 
packet header P, it is possible that more than one flow entries of T> match P, 
in which case we define the best matching flow, as follows. Suppose two flow 
entries, Fi and F 2 , match P. We say that Fi is a better match than F 2 if each 
field of P has an equal or longer match with Fi than F 2 - The best matching 
flow of P is the flow that is a better match than any other matching flow in T>. 
For instance, if we consider a packet header (0011,1100), and two flow entries 
Fi = (001*, 110*), and F 2 = (00, 1*). Then, F\ is the best matching flow for 
the packet. 

In order for the best matching flow to be well-defined, the flow entries must 
be consistent, that is, there cannot be two flow entries that partially overlap 
in the flow address space. We say that a flow routing table T> is consistent if 
for any two flow entries Fi and Fj either Fi and Fj are disjoint, or one is a 
subset of the other. Because the primary motivation for flow-based routing is 
to uniquely classify flows, we will be interested only in consistent flow routing 
tables. A related work P shows how to transform a set of possibly inconsistent 
classifiers into a consistent ones, by adding additional entries. 

1.2 Flow Aggregation and Our Contribution 

In a consistent flow routing table, each packet header has a unique best matching 
flow. We say that two flow tables are equivalent if each possible packet header 
receives the same routing action in both tables (using the best matching flow 
rule). We can define the flow aggregation problem as follows: Given a flow rout- 
ing table T>, compute another table V that is equivalent to V and has the 
smallest possible number of flow entries. As an example, consider a flow table 
with the following four entries: ((00*, 10*), A), ((00*, 11*), A), ((01*, 10*), A), 
((01*, 11*), B) . The smallest equivalent table for this example has two entries: 
((0*,1*), A), ((01*, 11*), B). 

Our main result is a fast algorithm for determining the optimal aggregation. If 
the input table has N flow entries, and K distinct routing actions, and each field 
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(source or destination) has at most w bits, then our algorithms runs in worst- 
case time 0{NKw‘^)-, using quadtree style path-compression the worst-case 
time can be improved to 0{NK), assuming w word size. 

A pragmatic question one can ask is this: how are flow entries generated, 
and should one expect any significant aggregation to be achieved? Indeed, if 
the flow routing entries were manually generated by a network manager, then 
one would not expect any significant aggregation by running our algorithm. 
Instead, we expect the flow entries to be generated automatically by various 
algorithms that are being proposed for dynamic routing and traffic engineering. 
These protocols can generate a large number of flow entries, and since the number 
of distinct next hops at each router is much smaller (generally, tens or at most a 
few hundred in very large backbone routers) than the number of flows, significant 
aggregation may be achievable. There are also proposals for using packet traces 
at ISP boundaries to build virtual-circuit paths, such as in multi-protocol label 
switching (MPLS), which are basically flows routing entries. Like the IP prefix 
aggregation in stateless routers, flow aggregation has the benefits of improved 
lookup time and reduced memory. (Reducing memory also leads to improvement 
in the lookup time, because a smaller data structure may fit entirely in the fast 
cache P|.) 

1.3 Previous Work 

A lot of work has been done in the networking community on congestion con- 
trol and end-to-end delay bounds assuming that routers maintain flow informa- 
tion mm- However, we have not seen any algorithmic work on aggregating 
flow state. 

The one-dimensional version of our algorithm solves the flow aggregation 
problem when flows are defined simply by destination-address prefixes. This 
turns out to be the prefix table aggregation problem, which was solved indepen- 
dently by Daves et al. |Z], preceding our work by a few months. The main focus 
and result in [Z| is prefix compaction, while our main motivation and contribu- 
tion is flow aggregation, which is a two-dimensional problem. We do not believe 
that the algorithm in [Z] generalizes to flow aggregation, and we think our geo- 
metric interpretation and resulting dynamic programming are central to solving 
the flow problem. The flow aggregation problem is formulated as a geometric 
compression problem in Section |21 We describe the one-dimensional version of 
our dynamic program in Section 0 primarily to lay the groundwork for the two- 
dimensional flow aggregation problem. In Section ^ we present our main result: 
the flow aggregation algorithm. In Section 0 we present some extensions and 
experimental results. Finally, we conclude in Section 0 



2 Flow Entries as Rectangles 

We interpret each flow entry as a geometric rectangle in the two-dimensional 
IP address space — the two axes are the source and the destination addresses. 
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(b) 



Fig. 1: (a) An example showing 5 consistent rectangles. The number at the 
top-right corner of each rectangle gives its color (action). A point that lies 
in both rectangles 1 and 3, receives the color 3, corresponding to the more 
specific match. A point lying in rectangle 4 gets the color 4. (b) An example 
of two inconsistent rectangles. 



Since each address uses w bits, the domain is the integer line [0,2*" — 1] along 
each axis. The source and destination fields are network address prefixes, and 
each such prefix encodes a contiguous range of addresses. For instance, the prefix 
101* corresponds to the closed interval [1010 • • • 0, 1011 • • • 1]. The prefix ranges 
have the property that either two ranges are disjoint, or one contains the other. 
The range of si contains the range of S 2 precisely when si is a prefix of S 2 - For 
instance, the range of 10* is a superset of the range of 10110*, but the ranges 
of 1010* and 110* are disjoint. A packet header has fully specified source and 
destination addresses, and thus corresponds to a point in the two-dimensional 
space. 

A flow {s,d) corresponds to the rectangle whose projections are the ranges 
of s and d in their respective dimensions. We denote this rectangle by R{s,d ) — 
the points of R{s,d) are precisely the packet headers that match the flow {s,d). 
To emphasize that we are dealing with special rectangles, we will use the term 
prefix rectangle. We say that two prefix rectangles are consistent if they are 
either disjoint, or one contains the other. The flow table V is consistent if all its 
flow entries are pairwise consistent. Figure Q shows examples of consistent and 
inconsistent rectangles. 

Consider a flow routing table V with N flow entries. These flows map to 
N prefix rectangles in the two-dimensional space [0, 2*" — 1] x [0, 2*" — 1]. We 
let each distinct action, associated with our flows, to be represented by a color, 
where colors are integers numbered from one to K. Thus, we can think of a 
flow tuple ({s,d), actioni) as a prefix rectangle with color i. Since each packet 
must be classified into some flow, we assume, without loss of generality, that the 
prefix rectangles of T> completely cover the two-dimensional space [0, 2’" — 1] x 
[0, 2^" — 1]. The flow classification induced by T> is the mapping from packet 
headers (points) to the set of colors. Using the best matching flow rule, each 
packet header receives a unique color: the color assigned to a point is the color 
of the smallest (most specific) rectangle containing the point. (Refer to Figure^) 
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Fig. 2: An example of flow aggregation. Fig. (i) shows an input with 8 rectan- 
gles, and Fig. (ii) shows an optimal solution using 5 rectangles. 



We can now formulate the flow aggregation problem. Given N prefix rectan- 
gles with colors in {1,2, . . . , K}, determine the smallest set of consistent prefix 
rectangles and their colors that induce the same coloring as the input set. Fig- 
ure 0 shows an example. We begin by considering the problem in one dimension 
to help us develop the main idea for our algorithm. 

3 Aggregation in One Dimension 

Consider a set of N prefixes T> = {si, S 2 , ■ • ■ , Sn}, where each Si is a binary 
bit string of length at most w, and the ith string is assigned color Ci, with 
Ci G {1,2,..., AT}. Each string Si corresponds to a contiguous interval on the 
line [0, 2“ — 1], which we call the prefix range of Si, and denote by R{si). 
The set of N prefix ranges partitions the line [0, 2*" — 1] into at most 2N — 1 
“elementary intervals,” where each elementary interval is the interval between 
two consecutive range endpoints. Assign to each elementary interval the color of 
the smallest range containing that interval. Under this coloring rule, the prefix 
set 2? is a mapping from the points of the line [0, 2™ — 1] to the color set 
[1,2, ... ,K}. Given a point P, we let V{P) denote the color assigned to P by 
the set V. Figure |3 shows an example, where a set of prefixes partition the line 
into six elementary intervals. The colors assigned to these intervals, in left to 
right order, are 2, 1, 2, 3, 2, 3. 

We say that two prefix sets T> and T>' are equivalent if they induce the same 
coloring on the line [0, 2™ — 1]. That is, T’(P) = V{P), for all P G [0, 2™ — 1]. 
The one-dimensional prefix aggregation problem can be formulated as follows: 
Given a set of prefixes V, And the smallest prefix set T>' that is equivalent to T>. 
Figure El (ii) shows the optimal solution for the example in (i); the number next 
to each prefix range is its color. 

Our algorithm uses dynamic programming to compute the optimal set T>'. 
We divide a prefix range into two halves, and then try to combine their optimal 
solutions. One difficulty with this obvious approach is that the combined cost 
may depend on the actual subproblem solutions. Gonsider, for instance, the case 
where we have four equal-length elementary intervals colored 1, 2, 3, 1. The left 
half subproblem has an optimal solution {1,2}; the right half subproblem has 
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Fig. 3: An example of aggregation in one dimension. Fig. (i) shows an input 
with 6 prefix ranges, and Fig. (ii) shows an optimal solution using 4 prefix 
ranges. 

an optimal solution {3, 1}. But adding them together does not give the optimal 
solution, which has only three prefixes. With this motivation, let us introduce 
the concept of a background prefix. 

Consider a prefix s, and its range R{s) C [0, 2“ — 1]. Suppose we just want 
to solve the coloring subproblem for the range R{s). We say that a solution G 
for R{s) contains a background prefix if s € G; that is, one of the prefix ranges 
in G is the whole interval R{s). The background color of G is the color of the 
background prefix. Fig. El (ii) shows an example that has a background prefix 
with color 2, while the set of prefixes in Fig. 01 (i) does not contain a background 
prefix. Our dynamic programming algorithm will use the key observation that 
it is sufficient to consider solutions in which background colors are well-defined. 

Lemma 1. Every solution of the coloring problem for a prefix range R{s) can 
be modified into a solution of equal cost with a background prefix. 

Proof. Consider a solution without a background prefix. Pick a prefix p in this 
solution such that the range R{p) is not contained in any other prefix’s range. 
Replace p by s, and give it p’s color. 

3.1 The Dynamic Programming Algorithm 

We are given a set T> = {si, S 2 , . . . , s™} of N prefixes, where each Si is a binary 
bit string of maximum length w, and the ith string is assigned color Ci, with 
Ci G {1, 2, . . . , A}. Consider the coloring induced by T> on the line [0, 2^" — 1]: 
a point has the color of the smallest prefix range in which it lies. (Note that 
fewer the bits in a prefix Si, the longer the corresponding range R{si) is. The 
null string * corresponds to the whole range [0, 2“ — 1], while a full ic-bit string 
maps to a point.) We start by building a partition of [0, 2“ — 1] in which each 
piece is monochromatic and each interval has length a binary power. That is, we 
recursively divide the line [0, 2“’ — 1] into two equal halves until each piece is 
monochromatic. Because T> has N prefixes, and each prefix has at most w bits, 
our final subdivision has size at most wN. 

Let Pi,P 2 t ■ ■ iPm, where M < wN, denote the prefixes that correspond 
to the monochromatic intervals in the final subdivision. We call the R{pi)’s 
monochromatic binary intervals. These intervals are the basic subproblems for 
our dynamic program’s initialization. Given an arbitrary prefix range R{s) C 
[0, 2^" — 1], and a color c G {1,2,..., K}, let us define 
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cost(s,c) = value of an optimal solution for the range R{s) with back- 
ground color c. 

We initialize this cost function for the monochromatic binary intervals R{pi), 
as follows. Let co{pi) be the color of the interval R{pi) — that is, co{pi) is the color 
induced on R{pi) by the input prefix set T>. Then, for i = 1,2,..., M, we let 
cost{pi, c) = 1 if c = Co(pi), and cost{pi, c) = oo otherwise. The following lemma 
gives the general formula for this cost function. Given a prefix s, we use sO and 
si to denote strings obtained by appending to s a 0 and a 1, respectively. 

Lemma 2. Let s be an arbitrary prefix. Then, for Ci = 1,2, K , 



Proof. Omitted from this extended abstract for lack of space. 

If the input I) has N prefixes, the number of colors is K, and the prefixes are 
w bits long, then the dynamic program based on Lemma 0takes 0{NKw) time 
and space. When we implemented our algorithm, we found that the worst-case 
memory requirement for this algorithm was infeasibly large to be of practical 
value. For instance, for the practical values of interest N = 50, 000, w = 32, and 
K = 256, the dynamic program needs to construct a table of size 4 x 10®. Even 
assuming that each entry takes just 1 word of memory, the worst-case memory 
requirement for this algorithm is 3200 MB of memory! This motivated us to look 
for an improved algorithm, which we describe in the next section. Not only does 
the new algorithm require significantly less memory in practice, but it is also 
simpler and faster. 

3.2 An Improved Dynamic Program 

Intuitively, maintaining K distinct solutions, one for each background color, for 
every subproblem seems like an overkill. (If we were only interested in the value 
of the solution then, we could of course choose not to store the intermediate 
solutions. However, they are needed for constructing the optimal prefix set.) But 
as we saw earlier, keeping just one optimal solution does not work. However, we 
show below that storing just the background colors that give the smallest cost 
for each subproblem suffices. Let s be an arbitrary prefix, and let C{s) be the 
list of background colors that give the minimum cost solutions for R{s). That is. 



Again, we initialize these lists for the monochromatic binary intervals by 
setting C{p) = {co(p)}. The following lemma shows how to compute these lists 
in a bottom-up merge. (Recall that sO and si are prefixes obtained by appending 
0 and 1 to the prefix s.) 




C{s) = {ci I cost{s,Ci) < cost{s,Cj), 1 < i,j < k} 
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Lemma 3. Suppose s is an arbitrary prefix. Then, 



C{s) = I 



£(so) n ^(si) «/^(so) n ^(si) 7^ 0 

£(s0) y T(sl) otherwise. 



Proof. We first consider the case £(s0) fl £(sl) ^ 0. Since all colors in the 
intersection set £(s0) fl /l(sl) are equivalent, it would suffice to show that 
cost{s,Ci) < cost{s,c) whenever a € £(s0) fl /l(sl) and c ^ £(s0) fl £(sl). 
It is easy to see that 



(That is, the minimum is achieved by the first term in the expression of Lemma0) 
Assume, without loss of generality, that c ^ £(s0). Then we must have cost(s0, c) 
> 1 + cost(sO,Ci), and cost(sl,c) > cost{sl,Ci)-, if c ^ the second in- 

equality is strict. Now, it is easy to check that 



Since cosfc(s0,c) > 1 -I- cost(s0, Ci), and cost(sl, c) > cost(sl, Ci), it follows 
that cost(s,c) > cost(s0, Ci) -I- cost(sl, Ci) > cost(s, Ci), which proves the claim. 

Next, consider the case £(s0)n£(sl) = 0. In this case we show that cost{s, Ci) 
< cost{s, c) whenever S £(s0) U £(sl) and c ^ £(s0) U £(sl). Let us assume 
that Ci G £(s0), and thus Ci ^ £(sl). Then, 



for any Cj G £(sl). Now, since c ^ £(s0) U £(sl), we have cost(s0, c) >1-1- 
cost(s0, Ci), and cost(sl, c) > 1 -I- cost(sl, c^). Since 



it follows that cost{s,c) >1-1- cost(sO,Ci) -I- cost(sl,Cj) > cost{s,Ci), which 
completes the proof. 

Lemma El gives a straightforward dynamic programming algorithm. Starting 
from the initial color lists of the monochromatic binary intervals, the algorithm 
computes the lists for increasing longer prefix ranges. When computing the list 
for prefix s, we set £(s) = £(s0) fl £(sl) if £(s0) fl £(sl) yf 0; otherwise 

£(s) = £(s0) U£(sl). Once all the lists have been computed, we can determine 
an optimal color assignment by a top-down traversal. (Details are presented in 
the full paper.) 

The worst-case complexity of the preceding algorithm is 0{NKw), since 
there are 0{Nw) subproblems, and the size of a color list is at most K. Thus, 



cost(s,Ci) = cost(sO,Ci) + cost(sl,Ci) — 1. 




COSt(s,Ci) = COSt(s0,Ci) + COSt(sl,Cj), 



cost(s,c) = min 
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from a worst-case point of view, the dynamic program based on Lemma Elis not 
much better than that of Lemma 0 However, in practice we found that the list 
sizes were much smaller than the total number of colors, and thus the memory 
requirement was substantially improved. We next describe our main result: the 
dynamic programming algorithm for the flow aggregation. 



4 Optimal Flow Aggregation 

Consider a set V oiN consistent flows. Each flow (s, d) corresponds to a rectangle 
i?(s, d) in the two-dimensional space [0, 2*" — 1] x [0, 2*" — 1]. The color of i?(s, d) 
is the color (action) associated with flow (s, d). Using the best matching flow rule, 
the set V gives a mapping from the set of points [0, 2™ — 1] x [0, 2*" — 1] to the 
set of colors. Let T>{P) denote the color assigned to point P by T>. Geometrically, 
2?(P) is the color of the smallest rectangle containing P. Our goal is to And the 
smallest set of consistent flows 2?' that realizes the same coloring map as T>; 
that is, T>{P) = T>'{P) for all points P. Our algorithm generalizes the dynamic 
program of the preceding section. 

We start with the observation that any solution can be modified to contain 
a background flow. The background flow for a prefix rectangle R{s, d) is the flow 
(s, d). We say that a solution G for the rectangle R{s, d) contains the background 
flow if (s, d) G G. The background color of G is the color assigned to the flow 
(s,d). Fig.0 (ii) shows an example that has a background flow of color 1; the set 
of flows in Fig. 0(i) does not contain a background flow. The following generalizes 
the background prefix lemma; we omit its easy proof in this abstract. 

Lemma 4. Every solution of the coloring problem for a prefix rectangle R{s, d) 
can be modified into a solution of equal cost with a background flow. 

Given a prefix rectangle R{s, d), and a color c S {1,2,..., K}, define 

cost{s,d,c) = value of an optimal solution for rectangle R{s,d) with 
background color c. 

The following lemma gives the general formula for this cost function. (Recall 
that we use the notation xO (resp. a;l) to denote the bit string x with 0 (resp. 
1) appended.) 

Lemma 5. Given a prefix rectangle R{s, d), and a color Ci G (1,2,..., K}, we 
have 



cost(s, d, Ci) 



min 



' cost(s0, d, 


Ci) 


-G 


cost{sl, d, Ci) — 


1 




cost{s0, d, 


Ci) 


-G 


cost{sl, d, Cj) 






cost{s0, d, 


Cj) 


-G 


cost{sl, d, Ci) 


^3 




cost{s, do, 


Ci) 


-G 


cost{s, dl, a) — 


1 




cost{s, do, 


Ci) 


-G 


cost(s, dl, Cj) 


^3 




_ cost{s, do. 


Cj) 


-G 


cost(s, dl, Ci) 


^3 
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dl 

do 



s 

Fig. 4: F spans R{s, d) along the s-axis; G spans it along the d-axis. 



Proof. Omitted due to lack of space. 

The key insight in the preceding dynamic program is the following geometric 
fact: since we are computing consistent rectangles, an optimal solution cannot 
have two prefix rectangles that cross each other. Thus, an optimal solution for 
R{s,d) with background color c must be composed of either the optimal solutions 
of the left and right half subproblems, or the top and bottom half subproblems. 
More specifically, let us introduce the following definition. 

We say that a prefix rectangle R' = {s',d') spans R{s,d) along the s-axis 
(resp. d-axis) if s = s' and d is a prefix of d' (resp. d = d' and s is a prefix of s'). 
Figure El illustrates this definition. Consistency implies that an optimal solution 
of i?(s, d), with any background color, cannot have rectangles spanning i?(s, d) 
along both axes. Absence of a rectangle spanning along s-axis (resp. d-axis) 
allows combining left and right (resp. top and bottom) subproblem solutions. 

4.1 An Improved Algorithm 

As in the one-dimensional case, the dynamic program can be improved in practice 
(though not in the worst case) by maintaining the list of only those background 
colors that give optimal solutions. Let £(s, d) denote the list of colors that achieve 
minimum cost for the coloring subproblem i?(s, d). That is, 

£(s,d) = {ci I cost{s,d,Ci) < cost{s,d,Cj),l < Ci^Cj < K} 

We use the notation cost(s, d) to denote the minimum cost of i?(s, d) over all 
colors; that is, cost{s,d) = min^ cost{s,d,Ci). In the following, we use the term 
“input rectangle” to mean the prefix rectangle corresponding to a flow in the 
input set T>. 



Flow- Aggregate (s, d) 

1. If no input rectangle of T> lies entirely inside i?(s, d), then all points mapped 
to the region R{s,d) receive the same color c. In this case, we set £(s,d) = 
{c}, cost(s, d) = 1, and return. 
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2. If no input rectangle of V spans i?(s, d) along the s-axis, then do the follow- 
ing: 

if £(s0, d) n d) 0 then 

costy{s,d) = cost{sO, d) + cost{sl, d) — 1] Cv{s,d) = £(s0, d) n£(sl, d) 
else costy{s,d) = cost{sO,d) + cost{sl,d)] Cv{s,d) = £{sO,d) U C{sl,d) 
Otherwise, set costv{s,d) = oo. 

3. If no input rectangle of T> spans R{s, d) along the d-axis, then do the follow- 
ing: 

if £(s, do) n £(s, dl) ^ 0 then 

costh{s, d) = cost{s, do) -I- cost{s, dl) — 1; £h{s, d) = £(s, dO) n£(s, dl) 

else costh{s,d) = cost{s,dQ) + cost{s,dl)\ £/i(s, d) = £(s, dO) U £(s, dl) 
Otherwise, set costh{s,d) = oo. 

4. if costv{s,d) > costh{s,d) then 

cost{s,d) = costh{s,d)] £(s,d) =£h{s,d) 
else if costy{s,d) < costh{s,d) then 

cost{s,d) = costv{s,d)] £(s, d) = £y{s,d) 
else cost{s, d) = costv{s, d); £(s, d) = £h{s, d) U £«(s, d) 

The code above describes a generic call on an arbitrary prefix rectangle 
i?(s,d). The initial call is made on the subspace corresponding to the 

rectangle [0, 2“ — 1] x [0, 2^" — 1]. In the code, costh, costy,£h and £„ are tem- 
porary variables used for comparing the solutions obtained by either combining 
the left and right halves of i?(s,d), or the top and bottom halves. Due to lack 
of space, we omit the proof of correctness of this algorithm. 

In order to analyze the running time of this dynamic program, we observe 
that a subproblem R{s,d) makes a recursive call only if R{s,d) contains at 
least one input rectangle of T> inside it. We can show that the total number of 
subproblems is 0{Nw), the cost of deciding if a rectangular region is spanned 
by some filter is 0{w), and the cost of maintaining color lists per subproblem is 
0{K). Thus, the total time and space complexity of the algorithm is 0{NKw^) 
in the worst case. 

Theorem 1. Given a set of N consistent flows, with K distinct colors and at 
most w-bit prefixes, we can compute an optimal flow aggregation in O(NKw^) 
worst case time. 

5 Extensions and Experimental Results 

5.1 Improving Time Complexity by Path Compression 

The dynamic programs of Sections O and 0 can be improved to eliminate the 
w factors, thus resulting in the worst-case running time and space 0{NK). 
The w factors arise due to long non-branching paths in the recursion tree. A 
standard quadtree style path compression can eliminate such paths, by shrinking 
the rectangle i?(s, d) in each step to ensure that each recursive call separates two 
input rectangles. 
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Table 1: One dimensional prefix aggregation. When multiple next hops were 
available for a prefix, we initialized the corresponding color list with all those 
next hops. The input and output are the number of prefixes. 



Database 


Input 


Output 


Reduction 


Memory 


Time 


Mae-East 


41455 


23680 


42.88% 


3.8 MB 


2.73 s 


PacBell 


24728 


14168 


42.70% 


2.1 MB 


1.85 s 


Paix 


7982 


5888 


26.23% 


0.8 MB 


0.72 s 



Minimizing the Bit Complexity We have used the number of flows as our 
complexity measure. Instead one could ask to minimize the total hit complex- 
ity of the flow routing table. Algorithms that use tries or bit vectors for flow 
classification Miiilil are sensitive to the total number of bits in the routing 
database. Given a flow / = (s, d), let b(/) denote the bit length of s plus the bit 
length of d. Then, the bit complexity of a flow routing table 2^ = {/i, / 2 , ■ ■ • , /«} 
is X^r=i We could ask for a routing table of minimum bit complexity that 

is equivalent to T>. It turns out that our dynamic programs also minimizes the 
bit complexity of the output table. 

5.2 Experimental Resnlts 

We implemented our dynamic programming algorithms, for both one- and two- 
dimensional aggregation. We do not have any publically available flow databases 
to test our two-dimensional algorithm, since the stateful routers are still in their 
infancy. On the other hand, prefix tables are widely available for large backbone 
routers, so we were able to test our one-dimensional aggregation algorithm. We 
ran our algorithm on three publically available routing tables, obtained from 
the Mae-East Exchange Point HH. The number of prefixes in these databases 
varied from about 8000 (Paix) to about 41000 (Mae-East). The total number 
of colors (distinct next hops) varied from 17 to 58. Table 1 below shows our 
results. While one-dimensional results are no indication of the two dimensional 
problem, it should be encouraging that our prefix aggregation algorithm achieves 
compression of 30-40% even in these highly aggregated prefix tables. It therefore 
appears likely that significant aggregation might be possible in the flow routing 
tables, which are going to be automatically generated. 

6 Concluding Remarks 

We gave an efficient algorithm for computing an optimal flow aggregation for 
reducing state information in IP routers. The algorithm is relatively simple, 
and exploits some basic geometric properties of consistent prefix rectangles in 
two dimensions. The basic dynamic progrmming algorithm runs in O(NKw^) 
worst-case time, for N flows with K colors and w bit prefixes. While the im- 
proved dynamic program does not reduce the worst-case complexity, it should 
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be substantially better in practice. With path compression in the recursion tree, 
the worst-case time can be reduced to 0{NK). 

The IP routers certainly need to move beyond the current best-effort service 
model, if they are to be used for advanced services like audio, video, or IP 
telephony. The past history has shown that highly stateful solutions like ATM 
(asynchronous transfer mode) have failed to be widely adopted despite their 
many ability to provide quality of service. Achieving similar capabilities in IP 
routers with minimal per-ffow state appears to be the most promising alternative. 
Our hope is that algorithms like ours for flow aggregation will make stateful 
routers more scalable, and thus more acceptable. 
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Abstract. In this paper, we discuss the problem of computing an opti- 
mal rounding of a real sequence (resp. matrix) into an integral sequence 
(resp. matrix). Our criterion of the optimality is to minimize the weighted 
loo distance Dist^™ {A, B) between an input sequence (resp. matrix) A 
and the output B. The distance is dependent on a family T of inter- 
vals (resp. rectangular regions) for the sequence rounding (resp. matrix 
rounding) and positive valued weight function w on the family. We give 
efficient polynomial time algorithms for the sequence-rounding problem, 
one for the weighted Zoo distance, and the other for any weight function w, 
for any family T of intervals. We give an algorithm that computes a ma- 
trix rounding with an error at most 1.75 with respect to the unweighted 
Zoo distance associated with the family W2 of all 2 x 2 square regions, 
whereas we prove that it is NP-hard to compute an approximate solution 
to the matrix-rounding problem with an approximate ratio smaller than 
2 for the same distance. 



1 Introduction 

Given a real number a, its rounding is either [aj or [a]. Given a d-dimensional 
n X n X ■ ■ ■ X n array {n‘^ array) A = {aij^^i 2 ^...,ia)i<ij<n of real numbers, its 
rounding is an integral array B = (6iiy2,...y,j)i<ij<n such that each entry 
^ rounding of Without loss of generality, we assume that 

each entry of A is in the closed interval [0, 1]. Such an array is called a [0, 1]- 
valued array. Thus, a rounding of A becomes a binary array. 

Given an array A, there are 2" possible roundings, among which a “good- 
quality” rounding is desired. In order to give a criterion to evaluate the quality 
of a rounding, we define a distance in the space A of all [0, l]-valued n’^ arrays. 
An orthogonal region R in the cZ-dimensional integral grid is a Gartesian 

product IiX I 2 X ■ ■ ■ X Id of integral subintervals Ij of [1, n] for j = 1, 2, . . . , cZ. For 
an element A G A, let A{R) be the sum of entries of A located in the orthogonal 
region R. Given a family T of orthogonal regions, the associated Ip distance 
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Distp (A, A') between two elements A and A' in A is defined by 
Dzst^{A,A') = \A{R)-A{R)\pY/p. 

The loo distance with respect to T is defined by 

Dist^{A,A') = lim Dist^ {A, A') = TR&y.\A{R) — A' {R)\. 

p^oo C _Rg^ 

More generally, we could consider a positive valued function w on J- and define 
the weighted loo distance 

Dist^^{A,A') = ma.x\{A{R) - A'{R))w{R)\. 

Let B be the set of all binary arrays in A. Given a [0,1]— valued array 
A, an optimal rounding of A is a binary n‘^ array B in B that is closest to 
A in the sense of the above-defined distance. The distance between A and its 
optimal rounding is referred to as the optimal rounding error. In this paper, we 
are mainly concerned with the weighted and unweighted loo distances. 

The supremum sup^g _4 min^gg i?) of the optimal rounding error 

is called the inhomogeneous discrepancy of A with respect to the family T or 
with respect to the distance Dist^. (See Beck and Sos ^D|). We deal with the 
following problems: 

Problem 1 (Discrepancy problem). Give combinatorial upper and lower 

bounds of the inhomogeneous discrepancy of A with respect to Dist^. 

Problem 2 (Optimization problem). How computationally hard is the 

problem of computing an optimal rounding of a [0, l]-valued n‘^ array? 

The discrepancy problem is a classical topic in combinatorics, and our main 
focus is on the optimization problem. In particular, we consider two special cases 
where d = 1 and d = 2, which are called the sequence-rounding problem and the 
matrix-rounding problem, respectively. These problems are not only combinato- 
rially interesting but also related to coding theory, data compression, computer 
vision, operations research, and Monte Garlo simulation. 

For the sequence-rounding problem, the inhomogeneous discrepancy with re- 
spect to Dist^ is at most 1 for any family T of intervals. On the other hand, it 
can be 1 even if we consider the family of all intervals of length 2. Therefore, the 
discrepancy problem is easily settled for the sequence-rounding. On the other 
hand, to the authors’ knowledge, the optimization problem has not been ad- 
dressed well in the literature. Viterbi m considered the li distance with respect 
to the family of all intervals of length k in application to a decoding problem, 
and proposed an 0{2^n) time algorithm. Although a similar algorithm could be 
applied to the loo distance, the time complexity would be exponential in general. 
We show in this paper that an optimal rounding of any sequence with respect to 
any weighted loo distance can be computed in 0{y/n\J- \ log^ n) time. The time 
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complexity is polynomial since \T\ = 0{v?). We also give an O(fc^nlogn) time 
algorithm if the maximal length of the intervals of ^ is fc. 

For the matrix-rounding problem, the inhomogeneous discrepancy with re- 
spect to Dist^ highly depends on the choice of the family of regions: If 
is the set of all orthogonal regions, an O(log^n) upper bound and an f?(logn) 
lower bound are known m- On the other hand, Baranyai |3 showed that the 
inhomogeneous discrepancy is less than \ \i J- consists of 2n -I- 1 regions cor- 
responding to all rows, all columns and the whole matrix (see for its 

applications). Moreover, it is known that the inhomogeneous discrepancy is less 
than 2 for a set T consisting of intervals in any two different scanning orderings 
on the entries of the matrix m- 

In this paper, motivated from an application to digital halftoning, we would 
like to consider the family yVk consisting of all k x k square regions for a 
small k. An 0{log^ k) upper bound and an l7(logfc) lower bound of the in- 
homogeneous discrepancy can be easily obtained from the above mentioned 
known results. Our main results for the matrix-rounding are on the family 
W 2 - We give a nontrivial 1.75 upper bound for the inhomogeneous discrep- 
ancy sup^g^ minsgB A, B), whereas we prove that it is NP-hard to 

approximate the rounding error with respect to Disf'^^ within the factor 2. The 
NP-hardness result is generalized for W 2 fc for any natural number k. 

Our motivation comes from digital halftoning, which is one of the most fun- 
damental techniques in image processing. An intensity image can be considered 
as a [0, l]-valued nxn array A where each entry Uij corresponds to a brightness 
level (gray level) of the {i,j) pixel of the pixel grid. Its digital halftoning is a 
binary nxn array B “approximating” A. The intention of this method is to 
convert a given image which consists of several bits for brightness levels into 
a binary image having only black and white pixels. This kind of technique is 
indispensable to print an image on an output device that produces black dots 
only, such as facsimiles and laser printers. 



Up to now, a large number of algorithms for digital halftoning have been 
proposed (see, e.g., [I tif I 1 7f 1 8j ). A comprehensive summary of the results 

obtained in the literature can be found in the Ph. D. Thesis by Ulichney I2D1. 
However, there have been few studies discussing reasonable criteria for evalu- 
ating the quality of an output image; maybe because the problem itself is very 
practically oriented. Actually, the most common criterion on digital halftoning is 
to judge the quality of output pictures by human eyes. It is desirable to establish 
a good evaluation system of halftoning methods (instead of the “human eye’s 
judgment”), and to handle the digital halftoning problem fully mathematically. 
The idea of using discrepancy for measuring the smoothness of halftoning has 
been given lioiiyj : however, to the authors’ knowledge, the discrepancy with 
respect to families of small regions and its computational aspect has not been 
well studied before. 



Imagine that we look at some pixel {i,j) of a gray-level image A. What 
happens is, we actually perceive an average of gray levels of some small neigh- 
borhood of that point. Using the same observation, the intensity around the 
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pixel (t,j) of a binary image is proportional to the number of white points in 
the corresponding neighborhood. Therefore, density values should be roughly 
equal around any pixel between an output binary image and the input image A. 
The observation motivates us to consider the family Wfc for a small k. Indeed, 
a weighted loo distance for the family seems to be a nice criterion for 

the digital halftoning problem. If an optimal rounding with respect to the above 
mentioned distance were computed in polynomial time, we could have designed 
an ideal automatic digital halftoning system with a concrete mathematical cri- 
terion. Unfortunately, our NP-hardness result implies that we need heuristics to 
solve the digital halftoning problem formulated as a matrix-rounding problem. 
One popular heuristic approach is to transform the digital halftoning problem 
into a one-dimensional problem by using a space-filling curve generated in some- 
what random manner where we can apply our sequence-rounding algorithm 
to solve the one-dimensional problem. 

2 Sequence-Rounding Problem 

2.1 Supremum of the Optimal Rounding Error 

Let a = (oi, 02 , . . . , a„) be our input sequence such that 0 < Oj < 1 for all 
j e {1,2, ...,n}. 

A popular algorithm used in digital halftoning to round such a sequence a 
is the error dijfusion algorithm, which computes the binary sequence b from bi 
to bn greedily in an incremental fashion. We always keep the difference Sj = 
~ ^i)- If have already computed 6i through bj, we determine bj+i 
to be 1 if Sj + Oj+i > 0.5 and to be 0 otherwise. It can be easily seen that 
—0.5 < Sj < 0.5 always holds, and hence for any interval J = [s,t], | ~ 

bi)\ = \St — iS's-il < 1. Therefore, the supremum of the optimal rounding error 
Dist^{a,b) is at most 1 for any family I of intervals. On the other hand, there 
is an example that the supremum becomes 1 even if each interval has length 2. 

Proposition 1. If we consider the family I of all intervals of length 2, there 
exists an input sequence a for which there is no binary sequence b attaining 
Disf^{a,b) < 1 — l/(n — 1). 

The proof is given in Appendix 1. 

2.2 Finding an Optimal Rounding — Known Results 

The error diffusion algorithm computes an optimal rounding of a with respect 
to Dist^ if I = : i = 1,2, .. . ,n}. However, it does not always find an 

optimal rounding for a general loo distance. Moreover, we would like to deal with 
weighted loo distances. If I consists of all intervals of length k for a constant k, 
it is relatively easy to design a linear time algorithm with respect to n by using 
Viterbi’s algorithm IZH based on dynamic programming (in precise, 0(2^n) time 
using 0(2^n) space) to compute an optimal rounding of a. The space complexity 
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can be reduced to 0{2^^/n + n) while keeping the time complexity Q and it can 
be further reduced to 0{2^+n) if we spend 0{2^n log n) time (a similar technique 
is found in 1^). However, it is nontrivial to design an efficient algorithm which 
is polynomial both in n and k. 

2.3 Polynomial Time Algorithms 

In this subsection, we give a polynomial time algorithm for computing an optimal 
rounding of a sequence a with respect to the weighted las distance Dist^ for a 
general set I of intervals and any weight function w on I. Without loss of gener- 
ality, we can assume that all entries of a are in the open interval (0, 1), since we 
can ignore integral entries to solve the problem. Since we consider the weighted 
las distance, an optimal rounding of a is a binary sequence b = (bi,b 2 , ■ ■ ■ , b^) 
which is the solution to the following integer programming problem, where z 
corresponds to the optimal distance between a and 6: 



minimize z 

subject to — z < w{I) ~ ^j) — ^ 0^^ £ ^)> (1) 

6, G{0,1} (VjG {l,2,...,n}). 

Recall that the weight function is always positive, i.e. w{I) > 0 for any I G I. 
Since the variables are all 0-1 valued, we can replace the inequality 

constraints © by 

min{ [w{I)-^z+J2j^i aj \ , n}>J2jei bj>ma.x{ z+J2jei ^ ^)- 

We introduce the variables Xq, ... ,Xn satisfying Xi — Xq = bi + ■ ■ ■ + bi for 
i G {1,2, . . . ,n}. For each interval I, the indices of its first entry and last entry 
are denoted by s(/)-|-l and t{I); In other words, I = [s(/)-|-l, t{I)] = {s{I),t{I)]. 
Then the above problem is transformed into the following problem 

minimize z 

subject to a;t(7) < min{[w(/)“fy -I- (V/Gl), (2) 

Xs(i) - Xt(i) < -max{\-w{I)-^z + Y,.^jaj~\,0} iyi GX), (3) 

Xj-Xj_i<l (Vj G {l,2,...,n}), (4) 

Xj-i-Xj<Q (Vj G {l,2,...,n}), (5) 

Xj is an integer (Vj G {0,1,2,..., n}). 

For the time being, we are concentrated on the decision problem: checking 
the existence of a vector {xq,X\, . . . ,x„) satisfying the above constraints when 
z is fixed. It is discussed later how to find the optimal value of z. 

From the above integer programming formulation. The decision problem is 
an integer programming on a system of difference constraints, and it is well- 
known that the problem is transformed to that of detecting a negative cycle in a 
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graph m- In order to make the paper self-contained, we give the construction 
of the graph: 

Let H = {N, E) be a directed graph with a vertex set N = {0, 1, 2, . . . , n} 
and an arc set E = Ei \J E 2 LI U E 4 where 



E, = {{s{l),t{i))-.iei}, 

E2 = mi),s{I)):l€l}, 

if 3 = {(0,l),(l,2),...,(n-l,n)}, 
if 4 = {(l,0),(2,l),...,(n,n-l)}. 



The arc weight Wij of an arc (t, j) G E is defined by 

r + (bJ) = G £^i), 

_ J ( if (bj) = (f(b),s(/)) G E 2 ), 

]1 {i^{^,J)€Es), 

[0 ( if (i,j) G £^ 4 ), 



where the variable 2 ; is fixed. The arc sets £ 4 , £21 £ 3 j £4 correspond to the 
constraints O, (0, ®. o. respectively. Note that the graph has 0{n + \X\) 
arcs. 

A negative cycle is an elementary directed cycle C satisfying that the total 
sum of the arc weights in C is negative. The detection of a negative cycle in H can 
be done in 0(n'^ ®mlog(n£)) time by using Gabow-Tarjan’s scaling algorithm for 
assignment problem m, where m is the number of edges and £ is the maximum 
weight. By definition, m = 0{n + \I\) and log(n£) = O(logn) in our graph. 

If there exists a negative cycle in £, then the inequality system 0, 0, 0, 
0 is infeasible. On the other hand, if the graph contains no negative cycle, 
shortest path length x* from the vertex f to n is well-defined and integer valued, 
and the vector (x*) f = 1 , 2, . . . , n satisfies the inequalities H, (0, and ®. 
The radix heap implementation of Dijkstra’s algorithm finds the path lengths 
in 0(m -I- nlog(n£)) = 0(|I| -I- nlogn) time. 

Now, we discuss the method to find the optimal value of z (i.e., smallest z 
causing no negative cycle). We employ the ordinary binary search technique. 
Each edge weight is represented by a step function with respect to z. Thus, 
we only need to consider the break points of the step functions. If we define 
q{h,I) = w{I){h -I- 0.5 -I- for an interval / and an integer u, the set 

Q = {q{h,I)\I G I, — n < h < n} contains all the break points. By applying 
binary search technique (with some care), we can find the optimal value of z by 
executing the above negative cycle detecting algorithm 0(logn|I|) = O(logn) 
times with additional 0{{n + |I|)logn) time for each search process. Thus we 
can find the optimal value of z in 0(n°'®(n-|- |I|)log^n) time in total. Hence, 
we have the following theorem: 

Theorem 1. An optimal rounding of a sequence with respect to the distance 
Distf^ can he computed in 0{n^'^{n+ |I|)log^n) time. The space requirement 
is 0{n + \T\). 
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We can design a better algorithm if each interval is short. 

Theorem 2. An optimal rounding of a sequence can he computed in 0{k^n\ogn) 
time using 0{n + + \I\) space if the family I is a set consisting of intervals 

of length at most k. 

Proof. First, we give an O(fc^n) time algorithm for checking the existence of neg- 
ative cycles of H. For each index p G {0,1,2,..., n}, the subgraph of H induced 
by the vertices {0, l,...,p} is denoted by Hp. It is clear that Hp is strongly 
connected. For each triplet of vertices (i,j,p) satisfying i,j G {p — k, . . . ,p} and 
k < p, d{i,j,p) denotes the shortest path length from i to j in the graph Hp 
when Hp does not contain any negative cycle. We first find a negative cycle in 
Hk, if it exists. If the graph Hk does not contain any negative cycle, we calcu- 
late the values {d{i,j, k) | t, j G (0 , . . . , k}} by using an all-pairs shortest path 
algorithm. Gabow-Tarjan’s algorithm for assignment problems solves the nega- 
tive cycle detection problem in log n) = 0{k‘^n) time. If we transform 

arc weights by using an optimal dual solution to the assignment problem ob- 
tained by Gabow-Tarjan’s algorithm, we only need to solve an all-pairs shortest 
path problem defined on a network with non-negative arc weights. By applying 
the radix heap implementation of Dijkstra’s algorithm k times, the computa- 
tional requirement is bounded by 0{k{k + \I\ + fclogn)) = 0{kfn) time, since 
\I\ = O(nfc). 

Suppose that Hp has no negative cycle, and we have already computed 
|(i(i,j, to) \ i,j G (0, . . . ,to}} for all to < p. If the graph i?p+i has no negative 
cycle. Then we can calculate the values {d{i,j,p+l) \ i,j G {p—k+1, . . . ,p-|-l}} 
from the values {d{i,j,p) \ i,j G {p — k,. . . ,p}} easily. It is because, a shortest 
path P from i to j in i/p+i satisfies one of the following two conditions; (1) P 
is contained in Hp, or (2) P is partitioned into a sequence of three subpaths 
{Pi,P 2 ,Pz) such that P\ and P3 are contained in Hp and P 2 is a path con- 
sisting of two arcs Oi and 02 where ai and 02 shares the vertex p -I- 1. Since 
both the in-degree and out-degree of each vertex are bounded by 0(fc), we can 
calculate the values {d{i,j,p+ 1) | i,j G {p — k + 1, . . . ,p + 1}} using the val- 
ues {d(i,j,p) I i,j G (p — fc, . . . ,p}} in 0{k^) time if we maintain the values 
{d{i,j,p) I, j G {p-k,... ,p}} by an {k + 1) x {k + 1) array. 

On the other hand, if i?p+i has a negative cycle C*, C* must contain the 
vertex p-|-l. Since we have the shortest path length in Hp between each (ordered) 
pair of vertices in {p — k,p — k + 1, . . . ,p} and each vertex adjacent to p -I- 1 in 
i/p+i must be in {p — k, p—k+1, . . . ,p}, we can check the existence of a negative 
cycle easily. Since the degree of each vertex is bounded by a constant, we can 
check the existence of a negative cycle in constant time. The above observations 
imply that we can check the existence of a negative cycle of H in O(fc^n) time. 

When the variable 2: is fixed and the graph H does not have any negative 
cycle, we can find the shortest path lengths x* for i = 1, 2 , . . . , n — 1 in 0{k‘^n) 
time by using a similar argument. Replacing the + |I|)logn) time 

algorithm for negative cycle detection with the above algorithm leads to an 
O(fc^nlogn) time algorithm. 
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3 Matrix-Rounding Problem 

3.1 Discrepancy Problem 

The following theorem is well known m-- 

Theorem 3. The inhomogeneous discrepancy of a [Q^l]-valued n x n matrix 
with respect to the family of all rectangular regions is O(log^n) and l7(logn). 
The same bounds hold for the inhomogeneous discrepancy for the family of all 
rectangular regions containing the left-upper corner entry of the matrix. 

We are interested in the matrix-rounding with respect to the set Wfc of all 
k X k square regions. The following proposition is obtained in a straightforward 
manner from the theorem above: 

Proposition 2. The inhomogeneous discrepancy with respect to Wk is 0(log^ k) 
and l7(logfc). Indeed, these hounds also hold for the union U^—fyVj. 

Proof. Without loss of generality, we assume that k divides n, and subdivide 
the grid into (n/k)'^ subgrids of size k x k. Then, each element of is a 

union of four rectangles in subgrids, and hence we have an 0(log^ k) bound from 
Theorem 01 

In order to show the lower bound, consider a, 2k x 2k matrix A whose lower- 
right quarter is an l7(logfc) instance for the discrepancy with respect to all 
rectangular regions in the quarter containing the entry Ok+i^k+i- The remaining 
part of A is filled with zero entries. For any given rectangular region R containing 
Ok+i.k-i-i in the lower-right quarter, there exists a region in Wfc consisting of R 
and zero entries. Hence we have the lower bound for Wfc. 

We remark that a polynomial time algorithm for computing a rounding with 
an 0(log'^ k) discrepancy can be designed based on the proof of Theorem 6.13 in 
eg. This is theoretically better than the popular two-dimensional error diffusion 
algorithm, for which the rounding error can become k (see Appendix 2). 

For the family W 2 consisting of all 2 x 2 square regions, there exists an 
instance A that the discrepancy is exactly 1. However, the authors do not know 
whether there exists an instance requiring Dist^^{A,B) > 1 or not. It is easy 
to show that the inhomogeneous discrepancy with respect to W 2 is at most 2; 
indeed, the checkerboard binary matrix C satisfies Dist'^^{A,B) < 2 for any 
input matrix A simultaneously. However, it is nontrivial to give a better upper 
bound; We can prove the following result (the proof is involved, and omitted in 
this version): 

Theorem 4. For any [0, 1] valued matrix A, there exists a binary matrix B 
satisfying that Disi^^{A,B) < 1.75. 
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3.2 NP-hardness of Computing an Optimal Matrix Rounding 

In this subsection, we prove the following theorem: 

Theorem 5. For any e> Q, it is NP-hard to decide whether the optimal round- 
ing error of a given matrix A is greater than 1 — e or less than 1/2 + e with 
respect to the distance Dist'^^ . 

For simplicity, we write Dist{A, B) for the distance Dist^'^ {A, B) . Each en- 
try of our hardness instance A has one of three values: 0, 1, and 1/2, where we 
use a convention that we can round 0 to 1 and 1 to 0. To make the proof mathe- 
matically formal, we should replace 0 and 1 with 6 and 1 — 5 for an infinitesimally 
small positive number S satisfying 6 < e/4. This is the reason why e appears in 
the statement of the theorem. By using the above mentioned convention, we 
ignore e and 6 in the proof. 

We prepare some useful definitions and a lemma. A zero-entry of A is called 
an absolute-zero entry if it is contained in a 2 x 2 square such that all of its 
entries are zeros. A pair of two 1/2 entries is called a good pair if there exists 
a 2 X 2 square region consisting of the pair and two absolute-zero entries. The 
following lemma is immediate: 

Lemma 1. If Dist{A, B) < 1/2, each absolute-zero entry must become 0 in B. 
Moreover, each good pair must become a pair o/O and 1 in B. 

We prove the theorem by using a reduction from the planar 3-SAT prob- 
lem m- An instance of planar 3-SAT is a Boolean expression E — Ei A 
i ?2 A . . . A Em where each clause Ej contains at most three literals which are 
variables or their negations and a planar graph is defined by the vertex set 
{Ei,E 2 , . . . , Em, Ui,U 2 , ■ ■ ■ , Ug} and the edge set {{Ei, uj) \Ej contains Uj or uf}- 
The nodes Ei {i = 1,2,..., m) are called the clause nodes, while the nodes Uj 
{j = l,2,...,q) are called the literal nodes. Then, the problem is to decide 
whether there exists an assignment F C {u\,iB,U 2 ,U 2 , ■ ■ ■ ,Uq,uf\ making the 
expression E true. 

A polynomial time reduction from a planar 3-SAT problem to the corre- 
sponding optimal halftoning is established as follows: Suppose that we are given 
a Boolean expression E of the above form together with a planar graph defined 
above. It is well-known that the planar graph representing the expression E can 
be replaced with a graph G embedded in a pixel grid of size polynomial in the 
total number of clauses and variables. Two pixels {i,j) and {i',j') in the grid is 
called adjacent \i\i — i'\ < 1 and \j — j'\ < 1. In other words, they are located 
in a common region in W 2 ■ Each edge of the graph G is represented by a series 
of adjacent pixels. There are three kinds of nodes of G, literal nodes, branching 
nodes and clause nodes. We can assume that there is enough grid space between 
each pair of nodes. In the following, we further modify the graph G such that 
the SAT assignment information is represented by using a [0, 1] -valued matrix 
A, such that E is satisfiable if the optimal rounding error of A is at most 1/2, 
otherwise it is at least 1. 
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For each variable node (i,j) associated with Us, we set aij = 1/2. If we assign 
0 to the variable Us, then bi^ = 0, else bij = 1. An edge between two nodes is 
a path consisting of adjacent half-entries. Moreover, each pair of adjacent pixels 
must form a good pair. Figure d shows our gadget representing an edge of G 
between two nodes X and Y. For convenience’ sake, we omit to write zero-entries 
in the figures, and also each half-entry is represented by an h (meaning ’’half’). 
The nodes X and Y also have values 1/2. Note that the direction of the edge 
can be bent to any of four possible slopes if we have a sufficient open space. 

From Lemma E if there exists an approximation B oi A with distance (at 
most) 1/2, there are only two possible assignments. One is shown in Figure Q 
and the other is its opposite. This means that the value of by oi B at Y is 
uniquely determined by bx- In the case of Figure [0 by = bx, However, we can 
define another path shown in Figure El forcing by = bx', thus, it creates the 
negation of a variable. Indeed, we can make both an odd-length path and an 
even-length path between two nodes to control the assignment of the literal in 
each clause. Our gadget representing a branching point node of G is illustrated 




Fig. 1. A gadget representing an edge of G which makes bx = by. 




Fig. 2. A gadget representing an edge of G which makes by to be the negation of bx- 

in Figure El Note that all zero-entries of the input matrix (left-side drawing) are 
absolute-zero entries. The rest to show is that we can simulate a clause node 
in the planar 3-SAT instance. Let the clause be a; V j/ V z, where x, y and z are 
literals or their negations. First, we make a weaker gadget, which corresponds to 
(x V y V z) A (a: V y V z) (not-all-equal-3SAT clause) . The left drawing of Figure E] 
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Fig. 3. A gadget for a branching node. 



illustrates the assignment of 1/2 entries in the input matrix A. The values of 
the matrix B at X, Y, and Z correspond to the Boolean values of x, y, and z, 
respectively. 

If dist{A,B) < 1, once we fix the values at X, Y, and Z in B, all values 
except the h at the center crossing are uniquely determined. If X = Y = Z = 0, 
the matrix B is the one illustrated in the right drawing of Figure El We will 
show that there is no possible assignment at the pixel p with the ? mark if 
dist{A, B) < 1 : Since the 2x2 matrix containing p as its south-east corner 

has the entry sum 3/2 in A, its entry sum in B must be either 1 or 2. Hence, 

the value at p must be 0 in H. However, the 2x2 matrix contains p as its 
north-west corner also has the entry sum 3/2 in A, and hence the value at p 

must be 1 in B. This is a contradiction. We can see that X = Y = Z = 1 

is another impossible assignment to make dist{A,B) < 1; on the other hand, 
for all other assignments of X, Y, and Z, we can find a rounding B satisfying 
dist{A,B) = 1/2. Next, we modify the above gadget to the matrix in the left 




Fig. 4. A gadget for a not-all-equal-3SAT clause node. 



drawing of Figure 0 It is easy to see that if X = Y = Z = 0, there is no B 
such that Dist{A, B) < 1. However, if X = Y = Z = 1, the right-hand side 
drawing shows that it is possible to make Dist{A, B) = 1/2. A key difference 
is that we have a one-entry in A, which is permitted to become 0 in H without 
violating the distance condition. We have constructed all required gadgets, and 
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Fig. 5. A gadget for a clause node representing X M Y \/ Z . 



thus proved that the planar 3SAT instance E is satisfiable if and only if there 
exists a Boolean matrix B satisfying dist{A,B) < 1/2. On the other hand, if E 
is not satisfiable, then Dist{A, B) > 1. Thus, we have proved the theorem. 

Corollary 1. For any k > 1, it is NP-hard to decide whether an optimal round- 
ing B for an input A satisfies Dist^^’^{A, B) > 1 — e or Dist^‘^^{A, B) < 1/2+e. 

Proof. Let A = (a^j) be the instance of 2-approximate hardness for W 2 con- 
structed in the proof of Theorem 0 Then, we define a, knx kn matrix C = (cij) 
as follows: We call an entry Cij special if both of i and j are divisible by k. The 
values of special entries are defined by Csk,tk = o,s,t (1 < s < n, 1 < f < n). Other 
entries are defined to be zero entries. For any W G W^k, W contains exactly four 
special entries, which correspond to entries of A located in a region in W 2 . It 
can be seen that flipping a non-special entry to 1 forces the rounding error to 
be at least 1 — e. Therefore, we have the theorem. 

4 Concluding Remarks 

We have considered sequence-rounding and matrix-rounding problems. The se- 
quence-rounding problem has been solved well; in particular, the optimization 
problem can be solved in polynomial time for the weighted loo distance. As for 
the matrix-rounding problem, many problems are left open. We especially want 
to design a nice approximation algorithm for the matrix-rounding problem with 
respect to Disf^’‘ . 
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Appendix 1. Proof of Proposition ^ 

Proof. Consider a sequence a = (ui), i = 1,2, .., 2p, defined by 0, l/(4p— 1), (4p— 
3)/(4p- 1), 3/(4p- 1), (4p-5)/(4p- 1), . . . , (2p- 2)/(4p- 1), 2p/(4p- 1), which 
satisfies a 2 j + a 2 j+i = (4p — 2)/(4p — 1) and a 2 j-\-i + 02^+2 = 4p/(4p — 1) for 
1 ^ J P — 1- From the construction it is obvious that there is a unique b 
approximating a with distance less than (4p — 2)/(4p— 1). Since we consider the 
intervals of length 2, this means that \ai-\-ai+i — bi — bi+i \ < (4p — 2)/(4p— 1) for 
i = 1,2, .., 2p—l. Indeed, 6 = 0, 0, 1, 0, 1, 0, 1, 0, ... , 1, 0, 1, which is an alternating 
sequence except the first two zeros. 
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Let a' be the reversed sequence of a, and consider the concatenation of a and 
a' . Naturally, the only possible approximation must be the concatenation of b 
and its reverse b' . However, the sequences meet at the middle of the whole input 
sequence so that 2p/{4p— 1 ) and 2p/ {4p— 1 ) are adjacent, and the corresponding 
outputs are 1 and 1. Thus, the difference of the entry sums in this neighborhood 
is 2 — 4p/(4p — 1) = (4p — 2) /(4p — 1). Hence, it is impossible to approximate it 
within the distance of (4p — 2)/(4p — 1), which proves the proposition. 

Appendix 2. Performance of an Error Diffusion Algorithm. 

A popular practical method in digital halftoning is the error diffusion algorithm 
(we have already seen its one-dimensional version): Scan the matrix array A in 
the scan-line order (i.e., scan row-wise from top row to bottom row, from left 
to right in each row), and greedily round the entries of A into binary values 
propagating the remaining error at each visited entry to its unvisited neighbor 
entries. The outline of the algorithm is as follows: We use four parameters a, /3, 7 , 
and S satisfying a + (3 + j + S = 1. At first, error{i,j) is initialized as 0 for 
each pixel (i,j) in the grid. When we visit the pixel (t,j), bij is obtained by 
rounding Ojj + error(i, j) to the nearer binary value. Now, compute rem{i,j) = 
aij + error(i,j) — bij, which is the error remaining at (i,j), and distribute the 
error values to its unvisited neighbors with predetermined weights. Formally, 
those errors are updated as follows: error{i, j + 1) := error{i,j + l)+a rem{i,j), 
error{i + l,j) := error{i + l,j) -I- /3 rem{i,j), error{i -|- 1, j -I- 1) := error{i + 
1, j-b 1) -I -7 rem{i,j), and error(i + l,j — 1) := error{i + 1, j — 1) -l-(5 rem{i,j). 
We need some care for pixels near the boundary of G, but we ignore it here for 
simplicity. 

Proposition 3. Suppose that B is the output of the error diffusion algorithm for 
an input A. Then, for the region family Wk, Dist^'°{A,B) < k + {k — l){'f + S). 
Moreover, there is an instance A such that the above distance exceeds k + {k — 
1)(7 -b (5) — e for any e > 0 if we apply the error diffusion algorithm. 

Proof. It is observed that —0.5 < rem{i,j) < 0.5 holds for 1 < i < n and 
iSijlfn. Fix a, k X k square R in G. Let Sin be a set of entries outside R from 
which error is directly propagated to some entry in R. Also, let Sout be a set of 
entries in R from which error is directly propagated to some entry outside R. 
Consider the total error propagated into R and also the total error propagated 
out of R, we have |A(i?) — B{R) \ < k + {k — 1)(7 -b S). On the other hand, we 
can manage to create the input attaining rem{i,j) = — 1/2 for each (z, j) G S'm, 
and rem{i',j') > 1/2 — e/k for each {i',j') G Sout- This gives the lower bound. 
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Abstract. We study various computational aspects of the problem of 
determining whether a given order contains a given sub-order. Formally, 
given a permutation tt on fc elements, and a permutation a on n > k ele- 
ments, the goal is to determine whether there exists a strictly increasing 
function / from [1..A:] to [l..n] which is order preserving, i.e., / satishes 
'^(/(*)) > '^(/O)) whenever 7r(i) > n{j). We call this decision problem 
the Sub- Permutation Problem. 

The study falls into two parts. In the first part we develop and analyze 
an algorithm (or, rather, an algorithmic paradigm) for this problem. We 
show that the complexity of this algorithm is at most where 

C(7r) is a naturally defined function of the permutation tt. 

In the second part we study G(7r). In particular, we show that C'(Tr) < 

0.35k o{k), implying that the complexity of the Sub-Permutation prob- 
lem is 0(cfc On the other hand, we prove that for most tt’s, 
C(7r) = n{k), establishing a lower bound for our algorithm. In addition, 
we develop a fast polylogarithmic approximation algorithm for comput- 
ing C'(Tr), and bound the value of this parameter for some interesting 
families of permutations. 



1 Introduction 

The question studied in this paper belongs to the vast family of questions deal- 
ing with finding a specified substructure within a given (large) structure. Our 
structures here are permutations, or, rather, order types of finite numerical se- 
quences. 

Definition 1. Let n be a permutation on k elements, and a a permutation on n 
elements, where n> k. A function f : [l..fc] i— >■ [l..n] will be called an embedding 
of 7T into a if it is 

1. Strictly increasing, i.e., f{i) > f{j) whenever i > j; 

2. Order preserving, i.e., a{f{i)) > cr(/(j)) whenever 7r(i) > 7r(j). 

In other words, tt is embeddable in a if it has the same order type as a subse- 
quence of a obtained by erasing some of a’s entries. In what follows, we shall 
use the notions “tt is embeddable in a” and “tt is a sub-permutation of a” in- 
terchangeably, and denote this by tt ^ a. 



M.M. Halldorsson (Ed.): SWAT 2000, LNCS 1851, pp. 490-EMl 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 
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Families of permutations with a fixed set of forbidden sub-permutations occur 
naturally e.g., in the study of permutations obtainable by using a single stack, 
or, more notably, in the well known open conjecture of Stanley-Wilf, claiming 
that for a fixed tt the size of the families Tn = {c G Sn \ tt a }, grows at 
most exponentially in n. Despite a considerable research effort, (see, e.g., 
the conjecture remains largely open. It seems that one of the main obstacles 
towards proving it is the lack of convenient criterions for establishing whether 
cr £ J-n or not. In particular, the problem of checking whether tt ^ a holds, is 
computationally difficult: it was shown to be NP-complete in 

In this paper we concentrate on the following version of the latter question, 
to be called the Sub-Permutation problem, (SP): 

Let IT G Sk be fixed. Given an input a G Sn, determine whether the relation 
TT ^ a holds. 

To distinguish between tt and a, we shall call the former the structure per- 
mutation, and the later the goal permutation. 

The above problem is, of course, polynomially tractable, and the brute force 
approach (checking all the sub-permutations of cr of size k) gives an 0{k ■ /k\) 

upper bound. A closer look reveals that the difficulty of the SP problem crucially 
depends on the structure permutation tt, and while, for instance, tt = {12. ..k) 
leads to a problem of complexity 0{kn), there is no obvious way to substantially 
improve upon the brute- force upper bound for a random tt G Sk. 

What is the correct complexity of a SP problem for a given tt? To our best 
knowledge, this question was not addressed so far in the literature. We develop 
an algorithm, whose performance is bounded by 0{ck + kn^^^^'^'>), where C{tt) 
is a naturally defined function of tt. We prove a general 0.35/c -I- o{k) upper 
bound on C(tt), but also, unfortunately, show that for most tt G Sk it holds 
C(tt) = n{k). This, however, is not always the case. We present a number of a 
naturally defined classes of permutations tt for which C{tt) is 0{\/k) and less. 

While the exact computation of C{tt) appears to be computationally difficult 
(and we do not know how to do it faster that in 0{k2^)), we indicate how an 
0(log^ k) approximation of C{tt) can be obtained in time polynomial in k. 

Thus, we make first steps in the study of the fascinating question of the 
complexity of a SP problem as a function of the structure permutation tt G Sk. 
The most interesting related open question remains: can the SP problem can 
always be solved in time Ckn°i^i for tt G Sfe? 

2 The Generic Algorithm 

We shall be mainly interested here in either finding a (single) embedding tt ^ 
cr, or concluding that no such embedding exists. However, after describing the 
procedure, we shall indicate how it can be modified in order to find all such 
embeddings. 

Our approach is basically that of dynamic programming. The structure per- 
mutation TT will be gradually exposed, and at each stage the corresponding tables 
will be suitably refined. 
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Consider (for the present, fixed) chain of subsetsets of [1..A:], 0 = Aq C 
Ai C ... C Ak = [l..fc], such that each A\Ai_i consists of a single element. 
Setting r(z) = Ai\Ai-i, we get a 1-1 function r : [l..fc] i-i- [l..fc], which we 
shall call the order of exposure corresponding to (and defining) the chain of 
subsets. The restrictions tt^’s of tt to Afs form a chain of sub-permutations 
0 = TTo^TTi'^ . . . ^TTfc = TT. Rigorously speaking, each tt^ is a 1-1 function from 
Ai to [1..A:]; however, since we shall be interesting only in the order type of tt^, 
without a risk of confusion, we shall think of it as of permutation G 5'^. 

Roughly, our strategy is to create tables Tq,Ti, ..,Tk, where Ti will store 
embeddings fi \ Ai ^ [l..n] of into cr. In order to get Ti, we take Ti_i and 
extend, if possible, each embedding /j_i G Ti_i to a set of embeddings fi. 

How does one obtain legal extensions fi of /i_i? Since fi must agree with 
fi-i on Ai_i, the question is where the new element t(z) can be mapped. The 
restrictions on the value of fi{T{i)) are: 

Monotonicity: Let p~ G Ai-i and pf G Ai-i be, respectively, the next 
to the left and the next to the to right elements to r(z) in Ai (we view Ai 
as an ordered set). Then, in order to maintain monotonicity, it should hold 

ft-i{p~) < /i(r(z)) < fi-i(pt) ■ 

Order Preservation: Let q~ G Ai-i and G be, respectively, the 
elements satisfying 7Ti(T(z)) — TTi(q~) = 1 and TTi{q'^) — ni{T{i)) = 1. Then, 
to maintain the order preservation property, it should hold 

o-(/*-i(C)) < ^iMT(i))) < cr(/,_i(g+)) . 

It is not hard to get convinced that the above restrictions on the value of /i(r(z)) 
are necessary and sufficient. 

Definition 2. In what follows, we call {p~ ,p^ ,q~ ,q^} C Ai-i t/ie significant 
elements for the i-th stage. The set of all possible values {fiirli))} in a legal 
extension of a given fi-i is completely determined by the values of fi-i on the 
set of the significant elements for the i-th stage. Note that this set may contain 
less than four elements: e.g., for i = 1, it is empty. 

Getting back to our strategy, it immediately becomes clear that saving in 
Ti the entire set of possible embeddings fi : Ai ^ [l..n] is infeasible: the size 
of Ti can get as large as (”), and we gain nothing compared to the brute force 
algorithm. How, then, could the content of Ti be condensed without incurring 
an information loss? 

Let us first examine the situation in a particularly clear and simple case 
when TT = (12. .A:), and the exposure order r is 1, 2, 3, ..., k (or, equivalently, Ai = 
{1, 2, ..., i} for i = 1, ..., k). Clearly, the only information about any particular fi 
which will be used in the subsequent extensions is the value of fi{i) . Thus, storing 
in Ti the values of fi{i) alone, we get tables of size just 0{n) which hold all the 
necessary information. The actual embedding can be reconstructed, if desired, 
using back-pointers where each fi{i) points to one of its fathers fi-i{i — 1). 
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For a better understanding of the nature of the gain in the above example, 
let us define for each z = 0, 1, fc — 1 the set C/j C [1..A:] of unforgettable elements: 



Definition 3. Let Ui, the i-th set o/ unforgettable elements, be defined as the 
union of all the significant elements for stages z -I- 1, k. That is, 

Ui = 1 {p ~ , , qj , } . 

The definition of the significant elements immediately implies the following 
proposition: 

Proposition 1. For any embedding fi \ Ai ^ the only elements of Ai 

whose value will ever be significant for the subsequent extensions of fi, are UiDAi. 

Therefore, following the usual logic of dynamic programming, it suffices to store 
in Ti only the values of f{i)’s on Ui fl Ai. In fact, this is precisely what was done 
in the above simple example, since in that case UiD Ai = {z}. 

Before going on with the formal description of the emerging algorithm, let 
us have a different, clearer look at the system of sets {Ui fl Ai}. 

Definition 4. Let tt G Sk be a structure permutation. The incidence graph Gt^ 
is a undirected multi-graph on k vertices with two types of edges: blue ones and 
red ones. Formally, y(G,r) = and E{Gtt) = EijiueiG^r) U ErediG^,.), where 

Eblue{G^) = { (z,j) I \i-j\ = 1}; Ered(G^) = { (z,j) I |7t(z) -7r(j)| = 1} . 

The following proposition establishes a surprising connection between Ui fl Ai 
and the boundary of Ai in Gt^: 

Proposition 2. Define 8t^A, the boundary of a subset A o/F(G^), as d^^A = 
{v G A \ F{v) ^ A } , where F{v) is the set of neighbours of v in Gt^. Then 

UiGiAi = d^,A^ . 

Proof. Let v G d^^Ai; this means that v possesses a neighbour u ^ Ai. Let j > i 
be the stage when u is first exposed. It is readily checked that u is a significant 
element for the j-th step, and therefore v G Ui. 

Conversely, let v G Ui n Ai. Assume v is significant for the j-th step, j > z 
Then, clearly, v G d,^Aj, and since Ai C Aj, this implies v G dT^Ai. ■ 

We may now present an algorithm for the SP problem. The main data struc- 
ture will be a sequence of tables {Tj}f=o- rows of each Ti will contain a 
numerical field per each element of and an additional single field for a 

pointer. Each row in Ti will correspond to some embedding fi \ Ai [l..rz] of tt^ 
into a. The numerical fields will carry the values of fi on = LjnAi, while 

the pointer will point to some fi-i (a row of Ti-f) whose extension resulted in 
fi. The rows of Ti must all be different with respect to the numerical fields. 
Observe that Tk has a single row containing only the pointer, while while Tq is 
in fact an empty table. 
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ALG-SP^,r(o-) : 

INPUT: cr G S'„ - a goal permutation ; 

7T G S'fc - a structure permutation; (* fixed *) 

T G - an exposure order (* currently fixed *) 

OUTPUT: an embedding fk : !->■ [l..n] of tt in a, or a message “NONE”; 

set To =NULL; 
for i = 1 to fc do { 
set Ti = % ■, 

for each row of Ti_i (* corresponding to some fi-i *) do { 

(* try to expand to *) 
for j = 1 to n do { 

(* check whether j is legitimate value for /i(r(i)) *) 

check whether /i_i(p“) < j < (* Monotonicity *) 

check whether a{fi-i{q~)) < a{j) < cr(/i_i(g+)); 

(* Order Preservation *) 
if both checks succeeded, do { 
create a new row of Tp, 
fill the numerical fields suitably, 

using data from the current row of Ti_i and the value j; 
direct the pointer to that row; 

if the new row did not occur yet in T^, add it to Tf, 

}}}} 

if Tk is empty return “NONE”; 

else, using pointers, reproduce an actual embedding fk, and return it. 

The correctness of the algorithm follows from the definition of the significant 
elements, and Proposition E What is its time complexity? First, we need to 
compute all {d,^Ai\, which will take a time at most quadratic in k. Then, at 
every stage i, 0(n) operations are performed per each row of Ti_i. In addition, 
one must take care not to create identical rows in T^. This can be done, e.g., by 
performing an on-line bucket sort, using a table of size (Note that 

< 1 + \d-KAi_i\.) Thus, the entire Uth stage can be implemented in time 
0(jA+\97,Ai-i\y pinally, the reconstruction of the resulting fk may require an 
additional time 0{k). Altogether, we get 

0{k^) + O -b ^ 

Remark: If our goal were to find all the embeddings tt ^ cr, instead of back- 
pointers we would use forward-pointers from /i_i to all its extensions fp, notice 
that are at most n such. The upper bound for the time it takes to fill up the 
tables would change by at most a multiplicative constant. Then, using a standard 
algorithm for DAG’s, we could find all the embeddings (corresponding to the 
paths from Tq to Tk) in time proportional to that spent so far, plus k times the 
total number of such embeddings. ■ 
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The parameter which governs the complexity of the algorithm is max^ 

Let us give this parameter a name. For a structure permutation tt G 5'^ and an 
exposure order r G Sk, define CriTr) = maxjgji . By the preceding 

discussion, the complexity of ALG-SP,r,r is bounded by O -b 
Now, observe that the exposure order does not have to be fixed, and in fact it 
makes a good sense to choose the best possible t for the given tt. Thus, we arrive 
at the key definition of this section: 

Definition 5. Define C{tt), the complexity of a permutation tt € Sk, as 

C{tt) = min C'r(Tr) . 

T 

Our discussion so far can be summarized by the following theorem: 

Theorem 1. The SP problem for a structure permutation tt € Sk can he solved 
in time 

O (cfe + , 

where Ck is the time needed to compute the best t. 

Alternatively, if finding the best t is too expensive, one can use a reasonably 
good f and get time complexity O {ck + where Ck is the time need 

to produce such f . 

The remaining part of this paper is dedicated to the study of C'(Tr). In par- 
ticular, we shall see how to produce f such that C{tt) < Cf (tt) = 0.35fc + o{k), 
indicate how a poly-log approximation of C(7t) can be obtained in time polyno- 
mial in k, and discuss a number of concrete examples of tt’s. 



3 Complexity of Permutations 

The graph G,r defined in the previous section provides a convenient way to ap- 
proach many questions related to C{tt). Observe that in fact Gtt is a multigraph 
obtained by taking a union of two Hamiltonian paths, blue and red. Conversely, 
any such multigraph corresponds to some tt under the suitable labeling of vertices 
(defined by the blue path). The maximum degree of G,r is at most 4. 



3.1 General Upper Bounds 

The first natural question to ask is how well can a fixed exposure order t perform. 
The answer is given by the following proposition: 

Proposition 3. Let tt he a structure permutation in Sk, and let the exposure 
order be Id, i.e., (1,2,3, k) . Then, Cidirr) < 2/3 ■ k + 1 . Conversely, 

for any fixed exposure order r there exists a structure permutation tt such that 
Cr(7r) > 2/3 - k . 
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Proof. By the definition of the boundary dT^Ai is in fact a union of two 
boundaries: the blue one, created by the blue edges, and the red one, created by 
the by red edges. In the case of exposure order Id, the blue boundary is always 
of size 1. Let us show that the size of the red boundary of any set A C is 

at most 2/3 -k . Since the red degree of any vertex v gV (G^) is at most 2 and at 
least 1, we conclude that \d/f'^A\ < 2\A\. Since |A| + \ A\ = k and \d'ff'^A\ < |A|, 
the conclusion follows, yielding the first part of the theorem. 

For the second part, let us first show that it is true for r = Id. It suffices 
to construct a permutation tt G Sk such that the = 4l2/3.fc- Here is 

an interesting concrete example of such a permutation: Let k = S’”, and define 
p : [0..A: — 1] I— >■ [0..A: — 1] as a permutation which maps a number m in the 
range to the number whose trinary representation (with leading zeroes) is the 
reverse of the trinary representation of m. Since the image of [0..2/3 ■ k — V\ 
consists of (all) numbers whose least significant digit is 0 or 1, while the image 
of [2/3 ■ k..k — 1] consists of (all) numbers whose least significant digit is 2, and 
we see that every m G [0..2/3 • fc — 1] has a red edge to [2/3 • k..k — 1]. Thus, p 
has the desired property. 

To complete the proof of the second part, observe that the structure permu- 
tation TT = p o T~^ with respect to the exposure order r (which can be viewed 
as a permutation of the range) has red boundaries isomorphic to those of p with 
respect to Id. ■ 

Two remarks are due. First, we have treated so far r as the exposure order, 
which is of course a permutation of the range. We have refrained so far from 
calling T a permutation to avoid an unnecessary confusion. Second, as we shall see 
later in Section [i.iil there exists an exposing order r such that Cr{p) = 0{y/k). 
Thus, the identity permutation can be far off the best exposure order. 

The next natural question is how well does a random exposure order behave. 



Theorem 2. For any structure permutation n G Sk, almost all exposure orders 
T satisfy Gt(7t) < (0.54 -|- o(l))/c. 

Proof. The proving mechanism used here and in following theorems of this sec- 
tion is: 

1. Given a structure permutation tt G Sk, define a random process V which 
produces the exposure order t G Skhy exposing [1..A:] vertex by vertex. (Be- 
tween the consecutive exposures there can be some non-exposing activity.) 
The process V will be described by specifying the conditional distribution 
according to which the next step is performed. 

2. For each i < k corresponding to the exposure of the i-th vertex, define the 
random variable W = \dT^{Ai)\, compute its expectation E[W], and show 
that Xi is well concentrated, i.e., that for some function g{k) = o(k) it holds 



Pr[W-E[W] > g{k)\ < o{l/k) . 
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After having succeeded proving this property, we may conclude that a ran- 
dom permutation r drawn according to V almost surely satisfies < 

maxjE[Xi] -I- g{k) . 

3. In order to prove the concentration property, the standard martingale tech- 
nique is used (see, e.g., |2| and jS| for a detailed description of how the 
martingales are used in Discrete Mathematics). For each i and t = 0, 1, fc 
we define a random variable as the expected value of Xi after the exposure 
of the first t vertices. Clearly, E[Xi] = Y^, and form a martingale. 

I.e., ] = Y*. Then we establish the Bounded Differences Prop- 
erty, i.e., that there exists a constant L such that ~ ^ L for all i 

and t, and conclude, by Azuma Inequality, that 

Pr[A:i-E[Xi] > 2LVklnk] < 1/fc^ . 



Let us see how the above strategy works for analyzing random uniformly chosen 
T G Sk- We define P by saying that at each stage one of the unexposed vertices 
is drawn uniformly at random. 

Let us first estimate E[Ali]. What is the probability of vertex v G V{G) to 
belong to 9,rAi? A simple combinatorial argument shows that if v has degree 4 
(counting multiple edges as a single edge), then 



Pr)?; e d^Ai] 




i — 1 
k-1 



i — 2 i — 3 i — 4 
k — 2 fc — 3 k — 4 



< h{i/k) + o(l) , 



where h is a real valued function h{x) = x{l — x^). If the degree of v is less then 
3, the above probability decreases. Thus, by linearity of expectation. 



maxElAld < k max h{x) + o{k) < 0.54k + o{k). 
i xe[o,i] 

In order to complete the proof we have to verify that the exposure martingale 
{y/}Jho the Bounded Differences Property. But this is easy: since an expo- 
sure of an additional vertex can influence at most 4 other vertices, we get at 
once -y/l < l-b4 = 5. ■ 

How can the constant 0.54 of Theorem El be improved? On one hand, the 
process P should be more “sensitive” to the current boundary, and attempt not 
to increase it needlessly. On the other hand, if one wishes to follow the same 
general strategy, P should be “stable” in the sense that the exposure of an extra 
vertex could have only a limited influence on the future behaviour of P. Such 
behaviour is required to ensure the Bounded Differences Property, needed for 
the analysis of P. 

A natural improvement on the previous random process would be to fill up 
the “holes” (i.e., the unexposed vertices all of whose neighbours are exposed) 
upon their creation. Clearly, such an operation can only be beneficial for the 
size of the boundaries, and there is no gain in delaying it. We call the new 
process Pi. 
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Theorem 3. Almost all exposure orders t G Sk produeed hyVi satisfy < 

fc(0.46 + o(l)) . 

Proof. Sketch: We follow closely the proof of Theorem |21 Using a similar, but 
much more involved technical analysis, we arrive at the conclusion that 

Pr[z) e d.„A^] < hi{\Ai\/k) + o{l) , 

where Ai is the set of all exposed vertices after the exposion of the *-th truly 
random (i.e., not forced by holes filling) vertex, and h\ is a real valued function 
determined by the local structure of Gt^. It is found by a lengthy case analysis; 
e.g., in the typical case when Gt^ has (locally) degree 4, and no cycles of length 

< 4, h{x) = (1 ~ (1 — a;^*) . The maximum value of hi{x) 

on [0, 1] is 0.46. Therefore, by linearity of expectation, 

maxEWd < fc(o(l)+ max h{x)] < (0.46 + o(l)) A: . 

i V “6 [04] / 

As before, in order to complete the proof it remains to check that the exposure 
martingale {Yf} has the Bounded Differences Property. Although at the first 
glance one might suspect that the holes filling process might have a cascading 
effect, it actually cannot. It is not hard to get convinced that the exposure of an 
extra vertex at any particular moment can influence at most l + 4 + 3- 4= 17 
vertices (the vertex itself, its neighbours, and the neighbours’ neighbours), and 
thus — Yfl < 17 . We postpone a detailed explanation of this proof to the 
full version of the paper. ■ 

The process V\ can be further improved with respect to the sizes of the 
boundaries. Besides holes filling, there is one more beneficial operation: if an 
exposure of a vertex does not increase the size of the current boundary, there is 
definely no damage in exposing it. However, although we would like to perform 
this operation whenever possible, it may cause a cascading effect, distabilazing 
the process. In order to take care of this problem, for each d > 1 we introduce 
an “approximating” stable process defined as follows: 

Every vertex will have a “hight” in [l..d],oo. The vertices with hight < oo 
will be precisely the exposed vertices. At each stage t, for each m = 1, ..., d, let 
A'P be the set of all vertices whose hight does not exceed to. As long as there 
exist vertices v of hight > to + 1, whose addition to A™ does not increase the 
boundary of this set, pick randomly such a vertex v, and change its hight to 
TO + 1. If u was previously unexposed, it gets exposed during this operation. 
Now, if there are no such vertices, we pick a random vertex in [1..A:] and change 
its hight to 1 (exposing it if it was unexposed) . 

It can be shown (the details are postponed to the full version) that can 
have but a limited cascading, and that any random step may influence at most 1+ 
4^+1 vertices. Furthermore, the analysis of the expected size of the boundary can 
still be successfully carried out. (Although it becomes very messy). An analysis 
of yields the following theorem: 
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Theorem 4. For every structure permutation tt G Sk, produces with high 
probability an exposure order t such that < (0.35 -I- o(l)) k . Hence, the 

complexity C{tt) of tt is at most (0.35 -I- o(l)) k. 

Before concluding this section, we would like to mention an entirely different 
approach for determining an exposure order r for the given tt. Take the Laplacian 
matrix of the graph, find its eigenvector Ci corresponding to the first positive 
eigenvalue, and expose the vertices in the increasing order of values of the entries 
of ei. Numerical simulations seem to indicate that this method is superiour to 
all other methods described in this section, leading to a constant about 0.22. 
There is also a good intuition why it should work (see, e.g., 0 for a related 
discussion). Unfortunately, we do not know how to prove this. 

3.2 Approximation and a General Lower Bound 

So far we have worked with the vertex boundary dA of the set A G G. Let us 
introduce also the edge boundary DA oi A G G, defined by 

DA = {eG E{G) I eGE{A,A)}. 

Since our G has degree bounded by 4, it holds 

\d^A\ < \D^A\ < 4|5^A| . (1) 

The edge boundary of a set is one of the basic terms of Graph Theory, and 
its introduction permits to employ the existing machinery. Here is how it can be 
used to efficiently compute an approximation of G{tt)\ 

Theorem 5. An exposure order t for which C'T-(Tr) < 0{log^ k) C{Tr) can be 
constructed in time polynomial in k. 

Proof. Finding the order of vertex exposure r of V{G) for an (arbitrary) input 
graph G which minimizes the maximal DAi, is a well known classical problem 
called the minimum cut linear arrangement problem. It is NP-complete. How- 
ever, Leighton and Rao fS] have designed a polynomial 0(log^ k) approximation 
for this problem. Using their algorithm on our graph G^^, and keeping in mind 
m, we get the desired approximation. ■ 

Next, we prove that there exist permutations such that C(7r) = D{k). In fact, 
almost all permutations have this property. 

Theorem 6. Let tt G Sk be a random, uniformly chosen permutation. Then 
there exists a universal constant c > 0 such that 

Pr[ C(7r) < ck] < o(l) . 

Proof. Assume for convenience that k is even. Call a graph G a-rich if any 
bisection of G (i.e., partition of V{G) = [l..fc] to two equal parts) defines an 
edge-cut of size at least ak. By ( 0 ), it suffices to show that a random G obtained 
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by taking a union of two randomly and independently chosen Hamiltonian paths 
P\ and P 2 on the vertex set [1..A;] is a-rich for some constant a > 0. 

Denote by V the uniform probability space over all the (multi-) graphs ob- 
tained by taking a union of two random Hamiltonian paths. We shall prove that 
there exist constants a, e > 0 such that for every bisection (Vi,V 2 ) the proba- 
bility that a P-random graph has a sparse (i.e, < ak) edge-cut with respect to 
(Vi, ^ 2 ), is less than That is, for every bisection {Vi, V 2 ), it holds 

P^i[\EGnE{Vi,V2)\<ak] < . ( 2 ) 

Since there are 2^“^ — 1 bisections of the vertices [1..A:], the probability that 
there exists a “meager” bisection (i.e., one yielding an edge-cut of size < ak), is 
less than = o(l). Thus, almost surely, a P-random graph is a-rich. 

It remains to establish a,e for which Inequality El is true. Let (Vi,V 2 ) be a 
fixed bisection of [l..fc]. Let us examine the distribution of the size of the edge 
cut defined by this bisection for a random G € P; we shall call this random 
variable S. 

For each path Pi, i = 1,2, define the indicator binary random variables 
{Xij}^^-^, which indicate whether Pi{j) G Vi or Pi{j) G V 2 . We shall think of 
the set of these indicators {Xi j} as of binary string X of length 2k ordered in 
the following way: 

X = {Xi^i,Xi^2, ■ ■ ■ ,Xi^k', X2^l,X2^2, ■ ■ ■ ,X2^k) ■ 



Observe that the string X has a uniformly distribution over all binary strings 
of length 2k with the property that both the first k bits and the last k bits are 
balanced (i.e., they contain k/2 O’s and k/2 I’s). Observe also that. 



5 = 5(X) = ^ Xij(BX,^j+i 

i=l,2; l<j<k 



where © is the addition operation in Z 2 . 

A simple calculation shows that even after dropping the balancedness re- 
quirements, the number of binary strings Y of length 2k for which S(Y) < ak 
is 



ak 









^H{aj2) 2fc+o(fc) 



were H{p) = — plog 2 P — (1 — p)log 2 (l — p) is the Entropy function, and the 
standard approximation used is from e.g., | 2 |. Recalling that the string X is 
uniformly distributed over strings of length 2k which are balanced with respect 
to both k first and k last bits, we conclude that 



Pr[5' <ak] < 



[ct / 2) -2k-\-o{k) 

22fc/0(fc) 



^ c2—{l — H{a./2))-2k-\-o{k) 



Taking a such that H{a/2) < 0.5 yields Inequality El and completes the proof 
of the theorem. ■ 
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3.3 Special Cases 

In this section we consider two different types of “linear” permutations ir G Sk, 
and show that both types have complexity at most 0{Vk)- Throughout this 
section, we think of permutations 7 G S'fc as permutations of the set [0..fc — 1], 
and not of the usual set [1..A:]. 

The “linear” permutation tt of the first type comes from an invertible linear 
transformation A : Z™ 1 — >■ Z™. Let fc = 2™. The value Tr{i) of a number i G 
[0..fc — 1] is defined by taking the binary representation of i (with leading zeroes), 
interpreting it as a vector v G Z™, applying ^ to u to get u = A{v), and 
interpreting u as the binary presentation of another number in [0..k — 1]. 

For the second type, we consider invertible linear transformation of the ring 
Zfc to itself. Formally, 

Definition 6. Let fc = 2™ for some positive integer m. A permutation tt G 
Sk is ealled a linear permutation of the first type if it is an invertible linear 
transformation : (Z2)™ i-f (Z2)'", under identifying Z™ and [0..2™ — 1] by 
means of the binary representation. 

A permutation tt G Sk is called a linear permutation of the second type if it 
is a permutation of the ring Zk by an invertible linear transformation 7 t(x) = 
ax + b mod k . 

Before starting with bounding the complexity of linear permutations, let us 
state a useful property of the complexity function: 

Proposition 4. Let tt G Sk be a structure permutation, and denote the identity 
permutation by Ld. Then, for every S C [0../c — 1], |i9,r(<S')| < |9/(i(S')| + 
|9/d(7i'(<S'))| • Consequently, for any r G Sk, < Cr{Id) + Ctot(W) . 

Proof. The inequality holds since the first term captures the blue boundary, 
while the second term captures the red boundary. The consequence follows by 
considering the sets = r[0..i]. ■ 

We start with showing that linear permutations of the first type have low 
complexity. Doing so, we show first that for a special subfamily of such permu- 
tations, the trivial exposure order t = Ld achieves the desired 0{\fk) bound. 

Theorem 7. Let tt G Sk be a linear permutation of the first type, and let At,. 
be its matrix representation m x m. Lf (A.,r~^)i,j = 0 for every {i,j)-th entry, 
0 < < TO, with 2j < i, then Cid{iT) < 2'^^l"'2'l . 

Proof. Let cq, . . . , Cm-i be the vectors of the standard normal base of Z™. We 
shall denote by W the linear subspace spanned by the vectors eo, . . . , e^-i. The 
addition operation in the Z| vector space will be denoted by ©. 

Assuming = 0 for every 2j < i, we conclude that for every t > 0 it 

holds V) D , and thus D P|^ij . 

Interpreting the linear subspace Vi in terms of the original domain [0..2™ — 1], 
we see that it is actually the subinterval [0..2® — 1]. Thus, we have 7 t([ 0..2* — 1]) A 
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[O.. 2 L 2 J _x]. Furthermore, since A^^Vi is alinear subspace of containing 
every vector u which is obtained from a vector v € by arbitrarily modifying 
its first [|J coordinations also belongs to AT^Vi. This implies that 7 t([ 2®]) is 

composed of subintervals, each one of length at least 2 L 2 J . Keeping in mind that 
the blue boundary of a subinterval in Gjd contains at most two vertices, and 
that the maximal possible number of such subintervals in 7 t([ 0..2® — 1]) is 
we conclude that 

|5,([0..2*-1])| < |a/d([0..2*-l])| + |5/d(7r([0..2*-l]))| < 2 + 2x2Til < 2~-+^ . 

In order to complete the proof we have to bound the size of the boundary 
9,r([0--i ~ 1]) for every t < k (so far, we have taken care only of powers of two). 
Using similar arguments, it is easy to show that |97r([2*p--2*P+2* — 1])| < 2!+^ . 
(Note that if a;, y G [2®p, . . . , 2*p + 2® — 1] then a: © y G [2®] and thus {A.,^{x) © 
A-T^iy)) G U|^i j ). Now, every interval [0..t— 1] can be represented as the union of at 

most m subintervals of the form [2*p, . . . , 2®p+2® — 1], where there is at most one 
subinterval of a given length. Combining the boundaries of those subintervals, 
we establish the bound |9,r([0..t — 1])| < 25+^ < 23+"^. ■ 

We are ready to prove now the bound on the complexity of a general linear 
permutation of the first type. 

Theorem 8. For a linear permutation n G Sk of the first type it holds C(tt) = 

o{Vk). 

Proof. Keeping in mind that, by Proposition 0 

Cri'x) < Cr{Id) + C7ror(dd) , 

it suffices (in view of the previous theorem) to find r such that for every (z, j) with 
2j < i, the following requirements hold: (T,-“^)-^- = 0 and (A,roT~^)i ^ = 0 . 
Equivalently, we need 

(A,-'), e, = 0 and {A^~^)^ = 0 (3) 

where (Ar~^)i is the z-th row of At~^. For every z. Equation El forces 2 [z/2j < z 
linear restrictions on the vector (A,-”^)^. Thus, for every z there exist at least 
vectors in (.^ 2 )"* which satisfy the restrictions of (|3I). Hence it possible to 
construct the transformation Ar~^ in an inductive manner, starting with z = 
TO — 1 and going down to z = 0, and calculating for each i the row {Ar~^)^. Note 
that we have to come up with an invertible transformation, and thus at every 
stage we must make sure that the new vector (H^~^)^ is linearly independent 
of the already constructed rows. Since we have candidates for (Ht-“^)^ 

satisfying the restrictions of (Ej) , and the number of different linear combinations 
of previously constructed vectors is 2™“®“^, it is always possible to find a legal 
candidate which is independent of the old vectors. ■ 
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We address now the linear permutations of the second type. 

Theorem 9. Let tt be a linear permutation of the seeond type. Then, for some 
(expicitly eonetrueted) r £ Sk, it hold Cr(7r) < 4\/fc. 

Proof. Assume for simplicity that fc is a square of an integer; the proof of the 
general case can be obtained along the similar lines. 

Define the sequence by 

Xt./k+r = -I- t mod k , 

for every 0 < t < k and 0 < r < '/k. Now, define the exposure order t by 
exposing the elements of Zk in the order they appear in the sequence {a;^}. In 
the case the same value appears many times in {xi}, we consider only the first 
appearance. 

Observe that for every i > y/k there exists j < i such that r(i) is a neighbour 
of r(j) in Gid as = 1. Therefore, it is possible to partition 

Ai into disjoint chains of neighbours, each “rooted” in the interval [0..\/fc]. Thus, 
Ai is composed of at most ^/k intervals. Therefore, Cr{Id) < 2^/k. 

Next, we aim to bound C',roT(dd). Notice that for every t > 0 and 0 < r < \/k 
the sequence . . . , 7r(a:j^_|_j,_j^) is a continuous interval, since 

= 7r(7r-i(r-bl)-bt)-7r(7r-i(r)-bt) = (r-bl)-r = l. 

Thus, Af°'^ is also composed of at most \/k intervals, which implies C,roT(dd) < 
2\/fc. Consequently, by Proposition E] Cri'x) < Cr{Id) + CTTorild) < 4\/fc . ■ 
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Abstract. We consider exact learning of monotone Boolean functions 
by membership queries, in the case that only r of the n variables are 
relevant. The learner proceeds in a number of rounds. In each round 
he submits to the function oracle a set of queries which may be chosen 
depending on the results from previous rounds. In a STOC’98 paper we 
proved that 0(2’' + r logn) queries in 0(r) rounds are sufficient. While 
the query bound is optimal for trivial information-theoretic reasons, it 
was open whether parallelism can be improved without increasing the 
amount of queries. In the present paper we prove a negative answer: &{r) 
rounds are necessary in the worst case, even for learning a very special 
type of monotone function. The proof is an adversary argument, based 
on a distance inequality in binary codes. On the other hand, a Las Vegas 
strategy based on another STOC’98 result can learn monotone functions 
in 21og2r -|- 0(1) rounds, without using significantly more queries. We 
also study the constant factors in the deterministic case. 



1 Introduction and Contributions 

In the widely known model of exact learning hy membership queries, a Boolean 
function f on n variables is given as an oracle ( “black box”), and a learner wants 
to identify /. To this end he may ask queries of the following type: He chooses 
an assignment (giving Boolean value 0 or 1 to each variable), and the oracle pro- 
vides the value of / for this assignment. Trivially, all 2" possible queries must be 
asked if nothing about / is known in advance. However, clever query strategies 
can exist if it is promised to the learner that / belongs to some restricted class of 
Boolean functions. There are trivial classes where still 2" queries are necessary, 
e.g. the class of functions that have value 1 for exactly one assignment. Even 
randomization cannot help in such cases. (Due to some recent fascinating results, 
quantum computers can solve search problems by surprisingly few queries, sub- 
ject to some error probability; see e.g. the survey article P!- But in the present 
paper we remain in the classical setting.) 

Here we are concerned with a function class which can be efficiently learned 
by membership queries, namely monotone Boolean function where at most r of 
the n variables are relevant. A Boolean function / is monotone if Vi : Xi < yi 
implies f{x\, . . . , Xn) < /(yi, • ■ ■ Vn)- A variable is irrelevant if switching its value 
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in arbitrary assignments does never change the function value. Otherwise it is a 
relevant variable. In the literature, the term attribute- efficient learning refers to 
learning strategies whose complexity is bounded by certain functions in both n 
and r, usually by logn and a polynomial in the length of a representation (which 
may be 2’’). 

We consider learning in rounds. In each round, the learner chooses a set 
of queries (assignments), and sends all these queries in parallel to the oracle. 
Then he may perform any computations with the obtained function values. In 
particular, the choice of the query set for each round may depend on the results 
obtained in previous rounds. In a randomized strategy, it may also depend on 
random bits. Learning in one round is called nonadaptive learning. 



Various aspects of attribute-efficient learning have been studied in mm, 
yi .'tfl 4|1 tifl !Sf‘21)f2Tj : this list is not exhaustive. An important special case of 
attribute-efficient learning is group testing, i.e. function / is known to be the 
disjunction of the relevant variables; see e.g. |7fiSft)p 1 2fTnj . Group testing, as well 
as attribute-efficient learning in general, has interesting applications in fields 
like chemical and biological test series, error search in hardware and software, 
an d pattern recognition; we refer to the several pointers in the above men- 
tioned papers. Parallelity is essential to applications where the tests (queries) 
are time-consuming but can be executed simultaneously iiEim. 

In |5j we devised a strategy that learns monotone functions with r relevant 
variables, using a total of 0(2’' -|-r logn) queries in 0{r) rounds. It should be no- 
ticed that the learner is not assumed to know r in advance. The query bound is 
optimal for trivial information-theoretic reasons. In the present paper we prove 
that the number of rounds is also optimal, in the following sense: For determin- 
istic strategies, there is an exponential tradeoff between the number of rounds 
and the coefficient of log 2 n in the query number. Consequently, any strategy 
that uses an optimum number of queries needs 0{r) rounds in the worst case. 
The result is even true for a specific monotone Boolean function whose struc- 
ture may be told the learner; only the location of the relevant variables must be 
kept secret. Our proof is an adversary argument, using an inequality for Ham- 
ming distances in binary codes. Due to this lower bound result, it makes sense 
to study the constant factors in nearly query-optimal strategies. Refining our 
strategy from 0, we obtain some concrete constants. 

On the other hand, we give a randomized strategy that learns monotone 
functions by -\-0{r log n) expected queries in only 2 log 2 r -|- 0(1) expected 
rounds. (In fact, this remains true for more general function classes.) It is based 
on another result from ^ on nonadaptive attribute-efficient learning of arbi- 
trary Boolean functions and uses standard ideas from hashing and the doubling 
technique. 



2 Lower Bound for Queries vs. Rounds 

Recall that the Hamming distance of two bit vectors is the number of positions 
where the vectors disagree. It is a straightforward exercise to prove that among 
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any three binary words of equal length there exist two words whose Hamming 
distance is at most | the length. This already enables us to prove a weaker version 
of Lemma ^ below. However, a stronger estimate on the minimum Hamming 
distance d between n binary code words of length q is known, namely: n < 2d-q ■ 
This is the Plotkin bound; see m or any textbook on error-correcting codes. It 
follows d < 

Lemma 1. There exists an adversary strategy such that the learner can iden- 
tify at most b log2 s relevant variables in each round of s log2 n queries, where 
lim„_j.oo b=l. 

Proof. The adversary chooses the following monotone Boolean function /. The r 
relevant variables of / are indexed X\,X2 ■ ■ ■ Xr, and the DNF contains all terms 
consisting of an Xj, j odd, along with all Xi, i < j, i even. That means, / can 
be written as 



X\ V 0:2X3 V X2X4X5 V X2X4X6X7 V X2X4X6X8X9 V . . . 

The adversary may even betray that the function is of this type, but he does 
not tell the learner what are the relevant variables. 

We discuss the crucial property of this function: x\ = X2 implies that / 
has the same value as Xi. Moreover, Xi = 1 implies / = 1. By the self-similar 
structure of this function, this repeats with later variables: If j is odd, Xj = 0 
for all odd i < j, and Xi = 1 for all even i < j, then Xj = Xj+i implies that / 
gets the same value as Xj. 

Assume that the learner asks s log2 n queries in the first round. The query 
set can be considered as a binary code of length q = s log2 n, assigning a code 
word to each variable. Exploiting the Plotkin bound, the adversary chooses two 
words (i.e. variables) y, z which differ in at most | log2 n bits, and settles that 
{y,z} is the pair of variables {xi,X2}. (Actually, as seen above, the guaranteed 
fraction is slightly larger than but the error tends to 0 with n — >■ 00. For 
notational convenience we pretend henceforth that half of the bits are different.) 
Next we describe how the adversary answers the queries of the round. For each 
query where Xi and X2 get equal values, of course, the adversary outputs 1 if 
both xi and X2 are 1 there, and 0 otherwise. The point is that these queries 
do not provide any information about the the remaining relevant variables: No 
matter which variables are the Xi, i > 3, the adversary’s answers are consistent. 
In other words, the learner can exploit at most the other | log2 n queries, in order 
to recognize further relevant variables. Moreover, the latter subset of queries can 
be split in two subsets, one with y = 1 , z = 0 , and one with j/ = 0, z = 1. If the 
former subset is the majority then the adversary will fix xi = y, X2 = z, and vice 
versa in the other case. Similarly as above, queries with xi = 1 are answered 
by 1 and provide no information about further relevant variables. Thus, even 
worse, the learner can exploit at most | log2 n queries (where Xi = 0, X2 = 1) to 
recognize further relevant variables. 

The adversary considers this subset of queries again as a (shorter) binary 
code, and chooses two variables of minimum Hamming distance to be X3 and X4, 
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and so on. Then the above argument can be repeated. After log 4 s steps, there 
remains a code of length log 2 n or less, hence the adversary may even choose two 
of the n variables which have not been distinguished by the remaining queries. 
We conclude that the learner can identify only the first 2 log 4 s = log 2 s relevant 
variables in the first round, without having gained further information on the 
other ones. 

Clearly, this argument also applies to the following rounds, always starting 
with the earliest Xj not recognized yet. • 



Now it is easy to derive the following lower bound. 

Theorem 1. Any deterministic strategy that learns monotone Boolean func- 
tions with r relevant variables in k rounds needs at least kd~/^ log 2 n queries in 
the worst case, where lim„_>oo c = 2. 

Proof. Let r^ be the number of relevant variables learned in the ith. round, pro- 
vided that the adversary observes the strategy of Lemma Dl Then we know that 
the zth round consists of at least 2’’*/^ log 2 n queries. Since ri = r and 2*/^ is a 
convex function in t, the lower bound for the total query number log 2 n 

is minimized if = r/k for all i. Thus the result follows with c = 2^/^ — )> 2. • 



Corollary 1. Any deterministic strategy using (some constant factor within) 
the optimum number of queries can be forced to spend f2{r) rounds. • 

Theorem Q gives a lower bound for the queries vs. rounds tradeoff. It is an 
open problem whether this bound is tight. In other words: Does there exist a 
learning strategy for monotone Boolean functions which needs only log 2 n 
queries, for any number k of rounds between 1 and 0(r)? The difficulty is to 
recognize some proper subset of relevant variables by a restricted number of 
nonadaptive queries. The presence of further relevant variables and an unlucky 
choice of queries may obscure their relevance, cf. m- 

3 Randomization Helps 

In this section we show that a randomized strategy can achieve both coefficient 
r in the query number and much less than r rounds. We have already applied a 
similar technique in for a different problem related to attribute-efficient learn- 
ing. (See also [Z] for a randomzied solution of a search problem which provably 
does not allow an efficient deterministic strategy.) 

As a preparation we recall the main result of jSl , which has a proof of several 
pages: 

Theorem 2. Arbitrary Boolean functions with at most s relevant variables can 
be learned by 0(s^2® -|- s2® log n) queries in one round, if s is known in advance. 
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Using this upper bound, we can prove a result which contrasts nicely to the 
deterministic case. Before this, we explain the notion of coarsening of a Boolean 
function /: Assume that the set of variables is partitioned into so-called bins. 
The coarsening g of f with respect to this partition is defined as follows. The 
variables of g are the bins, and for any assignment of Boolean values to the bins, 
the value of g equals the value of / if each variable inherits the value assigned 
to the bin it belongs to. In particular, an oracle for / can be used as an oracle 
for g. Empty bins (containing no variables of /) are considered as irrelevant. 

Note that, in the following strategy, the learner is not assumed to have prior 
knowledge about r. 

Theorem 3. There is a Las Vegas strategy for learning monotone Boolean func- 
tions using -\-0{r\ogn) expected queries in 21 og 2 r -1-0(1) expected rounds, 
if r of the n variables are relevant. 

Proof. For s = 1,2,4,8,16,... perform 3 rounds as described below, until the 
termination criterion in (3) is fulfilled: 

(1) Throw the n variables at random into 2^® bins. Then consider the induced 
coarsening g of /, and apply a strategy due to Theorem El to learn s relevant 
bins by 0(s^2®) queries. Note that this step fails if still s < r and if g should 
have more than s relevant bins. 

(2) Search for one relevant variable in each relevant bin. If a bin contains 
exactly one relevant variable, this can be done by log 2 n queries in one round, 
and for all such bins in parallel. (Details need some care, but are quite obvious.) 
This is a total of at most s log 2 n queries. 

(3) Test whether all relevant variables have been found, otherwise double 
s and repeat. For this termination test, try all possible assignments y on the 
detected relevant variables and assign 0 to all remaining variables. Similarly, 
assign 1 to all remaining varaibles. These are at most 2®+^ queries. Due to 
monotonicity of /, no further relevant variables exist if and only if, for every y, 
the all-0 and the all-1 assignment give the same function value. 

Although the first rounds may fail, (3) ensures correctness of the final out- 
come. Consider the triples of rounds when s > r is already reached. These triples 
are called trials in the following. As soon as the r relevant variables get into r 
different bins in a trial, all relevant bins will now be found in (1), and all the 
relevant variables will be found in (2), which is then verified in (3). Hence a fail- 
ure in a trial appears only if the relevant variables are not thrown into r distinct 
bins. The refore the failure probability is at most r^/2^®. 

The first trial needs queries in (1) and (3), and O(rlogn) queries in (2). 
Since s is always doubled, the query number in all previous rounds (with s < r) 
is within these bounds. After a failure in a trial with parameter s, the next trial, 
with parameter 2s, will ask 0(4s^2^®) queries in (1) and (3), and 2slog2 n queries 
in (3). However it is performed with probability less than r^/2^®. (Actually, this 
is a generous bound.) That means, the expected query number is bounded by 
0(r^s^/2®-|-r^s(logn)/2^®). The sum of these terms over all trials is convergent. 
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hence the first trial dominates the expected query number. The expected number 
of rounds is obviously 21 og 2 r + 0(1). • 

We conclude this section with a number of remarks. 

(i) If r is known prior to learning then, obviously, the same query number 
can be achieved in 0(1) rounds. The only critical point is to guess r. 

(ii) We have used 2^® bins for ease of presentation only; a smaller number of 
bins would be sufficient. Moreover we may simplify the strategy: If the prospec- 
tive number of bins should exceeed n, we may consider / instead of a coarsening, 
thus the failure probability is 0. 

(iii) Note that monotonicity is not really exploited. We used it only in (3), to 
test for the existence of further relevant variables. The same strategy is applicable 
to either class of Boolean functions which has the following properties: The class 
is closed under projection (i.e. fixing partial assignments), and it admits an 0(1) 
query test, deciding whether a function from the class is constant. 

(iv) The use of Theorem |3 is essential to our strategy. We do not see how to 
avoid it. 

(v) Our proof in |S( shows that a family of 0(s^2® -|- s2®logn) random as- 
signments is sufficient for identifying functions with at most s relevant variables, 
with high probability. Since our strategy is highly random anyway, we may use 
random assignments in (1), instead of explicitly constructed families (which is 
apparently a very difficult matter). 

(vi) Another important issue besides pure query complexity is the computa- 
tional complexity, i.e. the amount of auxiliary computation to identify / from 
the answers given by the oracle. The only problem lies in the application of Theo- 
rem |2 For satisfactory solutions we refer to j0|. In particular, we can derive time 
bounds like 0{exp{r) + nlogn) , which resembles to the notion of fixed-parameter 
tractability. 

4 The Coefficients in Nearly Query-Optimal 
Deterministic Strategies 

In view of Corollary ^ it is now worthy to study the constant factors u,v,w in 
deterministic strategies using u2’' -|- urlog 2 n queries in wr rounds. 

First we observe u > 2: Even if the learner gets the relevant variables for free, 

2T+1 

queries are necessary to verify that these are in fact the only relevant vari- 
ables. Namely, for each assignment on the alleged relevant variables, two queries 
must be asked, where all other variables are 0 or 1, respectively. On the other 
hand, a total of 2’"+^ queries during the whole learning process (that is u = 2) 
are sufficient for testing whether all relevant variables have been found, due to a 
simple a rgument: Even if some termination tests indicate further relevant vari- 
ables, the learner does not have to repeat earlier queries of this type. Altogether, 
u is the least interesting constant. For v we have the trivial information-theoretic 
lower bound u > I. It arises the question how small v can be actually made. 
The following strategy has v = 2 + o(l), w = 3, and u = 2. The skeleton of 
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this strategy has been given in but there we did not worry about constant 
factors, therefore some refinements of the strategy itself and of the analysis are 
necessary. 

We use some loose but convenient notion: An assignment x is identified with 
the set W of variables having value 1 in x. Consequently, we write f{W) for 
f{x), and to “query a set W” means to ask f{W). 

Theorem 4. Monotone Boolean functions with at most r relevant variables can 
be learned by 2’'+^ + (2 + o(l))r log 2 n queries in 3r rounds. 

Proof. Let / be the given function and V its set of n Boolean variables. Assume 
/(0) = 0 and f{V) = 1, otherwise / is constant, and we are done. 

We arrange the variables as a g-hypercube of dimension q = |"log 2 n~\ , some 
irrelevant dummy variables may be added if n is not a power of 2. This naturally 
defines q pairs of (g—1)— hypercubes. In the first round we query q of these {q—l)~ 
hypercubes, exactly one from each pair. If W has been queried and f{W) = 0 
then W becomes a so-called lower set, and V\W becomes an upper set. Similarly, 
if f{W) = 1 then W is declared to be an upper set, and C \ W is a lower set. 

There follow two rounds with q assignments formed in the following way (but 
not all of them will be queried). Arrange the upper sets in arbitrary order. Then 
the candidate sets to be queried are the intersections of the first i upper sets, 
for i = 1,. . . ,q. We partition this chain of nested sets into approximately y/q 
segments of length about y/q. In the second round, we query the first set (con- 
taining a single variable v) and the y/q sets bounding the segments. If f{{v}) = 1 
then V is relevant. Hence we have found some relevant variable after 2 rounds 
with q + y/q queries. 

Consider the other case f{{v}) = 0. There must be a jump from / = 0 to 
/ = 1 in our chain. In the third round we query the y/q sets of the segment 
where this jump occured. Let W and W U W be those neighbored sets in our 
chain with f{W) — 0 and f{W U W) = 1. By construction, there is a unique 
pair of {q — I)-hypercubes H and H' such that: W C H, W C H' , H is an 
upper set, and H' is a lower set. 

We have two subcases. If f{H) = f{H') then one easily sees that both H and 
H' contain at least one relevant variable. (Since / is monotone, this holds for any 
complementary pair of sets.) If f{H) f{H') then obviously f{H') = 0. Since 
/ is monotone, this implies f{W') = 0 = f{W). Together with f{W U W) = 1, 
we conclude as above that both W and W contain relevant variables. 

In order to distinguish these subcases, the learner has to query both H and 
H' . In the first round, only one of them has been queried. The second query 
might be asked in a fourth round, but we can save this round by asking all these 
partner queries in the interesting segment already in the third round. These are 
y/q additional queries. In summary, we have found two disjoint subsets containing 
relevant variables, after 3 rounds with q -\- 'iy/q queries. This situation is called 
a splitting . 

The whole process described above is recursively applied on the mentioned 
subsets, while the suitable assignment on the remaining variables is fixed. (De- 
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tails should be clear.) This yields a binary tree whose nodes are subsets of vari- 
ables, and whose leaves contain one detected relevant variable each. If no further 
splitting is made then the tree is ready, and we test whether all relevant vari- 
ables already appear as leaves. This termination test was described prior to the 
theorem, and we argued that queries are needed during the whole search. In 
the negative case we find some non-constant “projection” of / and can continue 
the process with a new tree. 

Altogether we obtain a sequence of binary trees. The total number of inner 
nodes is at most r — 1, and the sum of depths is at most r. An inner node 
represents 3 rounds, a leaf represents 2 rounds, but the termination tests add a 
further round to the deepest leaves in each tree. Hence every node represents 3 
rounds, which gives in = 3 in the worst case. The number of queries (excluding 
those of the temination tests) is bounded by 

r{q + y/q) + {r- l)(g + 3yg) = (2r - l)q + (4r - 3)^9 = (2 o(l))q. 

(For logn > 16r^ we even have v <2. ) • 



Once more, there remain some open questions. First, we conjecture that 
r) = 2 is optimal. Furthermore, we do not know any positive constant lower 
bound for w. We only have an alternative strategy with w = 2, but u = 6, and 
u = r. (Here u is no longer constant, since we perform termination tests for many 
variables which are only suspected to be relevant.) But it is not clear whether w 
can be made arbitrarily small, at cost of large constant v. The same adversary 
construction as in Lemma^ gives some lower bound for the v — w— tradeoff, but 
we have no strategy which achieves the corresponding upper bound. The problem 
is of similar nature as that mentioned at the end of Section 2. 
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Abstract. Given a graph G = (V,E) and a set of vertices M C V, 
a vertex u £ U is said to be controlled by M if the majority of u’s 
neighbors (including itself) belongs to M. M is called a monopoly if 
every vertex u £ U is controlled by M. For a specified M and a range for 
E {El C U C E 2 ), we try to determine E such that M is a monopoly in 
G = (V,E). We first present a polynomial algorithm for testing if such 
an E exists, by formulating it as a network flow problem. Assuming that 
a solution E does exist, we then show that a solution with the maximum 
or minimum |JJ| can be found in polynomial time, by considering them 
as weighted matching problems. 

In case there is no solution E, we want to maximize the number of vertices 
controlled by the given M . Unfortunately, this problem turns out to be 
NP-hard. We therefore design a simple approximation algorithm which 
guarantees an approximation ratio of 2. 



1 Preliminary 

Let G = (U, E) be an undirected graph, where V (resp., E) is the vertex (resp., 
edge) set. We assume that G is simple, i.e., G contains neither self-loops nor 
parallel edges. For a vertex v G V , let us define the neighborhood of v by Nq{v) = 
{u} U {re I {w,v) £ E}. A vertex u £ U is said to be controlled by a vertex set 
M C U if the majority of its neighbors is in M, i.e., 

\Ng{v) n M\ > \Ng{v)\/2. (1) 

Here we use a non-strict majority (including equality), however, all results ob- 
tained in this paper hold for the strict majority as well. For a vertex set M CV, 
let Cont{G, M) denote the set of vertices of G controlled by M. We call M a 
monopoly if it controls every vertex in the graph G, i.e., Cont{G, M) = V . 

The notion of monopoly was introduced by N. Linial et al. jS] to understand 
local majority voting in distributed computing. Local majority voting is mo- 
tivated, for example, by agreement problems in agent systems (e.g., i2HTO)- 
Let us consider the problem for the agents to agree on a standard from among 
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some proposals mi Under the assumption that every agent knows who supports 
which proposal, the agreement can be made simply by, for example, taking the 
one that the majority of the agents supports. The agent system, however, could 
be distributed too widely to admit of this solution, and this is why they sug- 
gested heuristic algorithms based on partial information on the distribution of 
the agents’ votes. 

For simplicity, suppose that there are two proposals, 0 and 1, and that the 
proposal that the majority of the agents supports is to be selected as the stan- 
dard. Given, for each agent, the group of neighboring agents whose opinions 
are available to it, a simple and natural heuristic to approximate the agreement 
would be to take the majority of the opinions available to it, i.e., the opinions 
of its neighbors and itself. This is called the deterministic local majority polling 
system. 

Peleg and his colleagues recently investigated such a system and determined 
how many agents supporting, say 0, are necessary and sufficient for the agreement 
to result in 0 PSC2I . They model the system by an undirected graph G = (V,E), 
where V and E respectively represent the set of agents and the (symmetric) 
neighborhood relation. Now, we can easily see that all agents decide on 0 if 
and only if there is a monopoly M in G whose members all support 0. In the 
deterministic local majority polling systems, ruling a monopoly M implies ruling 
V (i.e., all agents), and therefore, monopolies play an important role in such 
systems. Some other applications are discussed in M 

Linial et al. 0 discussed the problems related to monopolies as packing and 
covering problems on graphs. They showed that \M\ is Q{yjn) and gave a graph 
with a monopoly M of size where n = \V\. As for computational com- 

plexity, Peleg [El showed that the problem of computing a minimum monopoly 
is NP-hard, by reducing the minimum dominating set problem to it. Based on the 
non-approximable results 00 on the set cover problem and its variants, includ- 
ing the minimum dominating set problem, the following conjecture is plausible: 
For any real e > 0, the minimum monopoly problem has no (Inn — e) approxi- 
mation unless NP C Dtime{n^°^^°^^). On the other hand, it is known that a 
greedy algorithm yields a (In |if | -|- 1) approximation for the minimum monopoly 
problem. Bermond and Peleg 0 studied some of its modifications, “r-monopoly” 
and “self-ignoring” monopoly. Repetitive versions of the local majority polling 
system were also discussed by several authors IIVIlOll 111.^1 . 

As mentioned above, we can rule the whole system by ruling a small monopoly. 
By this property, in some applications, a monopoly is a favorite concept, and 
a (smart) way of monopolizing a given set by modifying the system topology 
is looked for. Motivated by them, in this paper, we first consider the following 
problem: 
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Monopoly Verification 

Input: Two graphs G\ = iy,Ei) and G2 = {¥,£2) such that Ei C E2, 

and a vertex set M CV. 

Question: Does there exist a graph G = (V,E) such that ( 1 ) E\ C E C E2 
and ( 2 ) M is a monopoly in G ? 



In case of YES, we want to compute a graph G with some additional property. 
Among such properties, we consider the maximality and minimality. 



Max-Neighborhood Monopoly 



Input: Two graphs Gi = (V,Ei) and G2 = (Y, E2) such that Ei C E2, 

and a vertex set M CV. 

Output: A graph G = (V,E) such that (I) Ei C E C E2, ( 2 ) M is a 
monopoly in G, and ( 3 ) M is not a monopoly in G' = (V, E') with 
El C E' C E2 and \E'\ > \E\, if such an E' exists; NO, otherwise. 



Min-Neighborhood Monopoly 



Input: Two graphs Gi = (V,Ei) and G2 = (E, A2) such that Ei C E2, 

and a vertex set M CV. 

Output: A graph G = (V,E) such that (I) Ei C E C E2, ( 2 ) M is a 
monopoly in G, and ( 3 ) M is not a monopoly in G' = (V, E') with 
El C E' C E2 and \E'\ < \E\, if such an E' exists; NO, otherwise. 



Let us assume that the current system topology is represented by G2 = (Y, A2) 
(resp., Gi = {V, El)). Then the members of M try to find the minimum-cost links 
in E2 — El so that the topology obtained from Gi by removing (resp., adding) 
such links secures the adoption of a proposal of M, if the members of M pay for 
the cost of breaking (resp., establishing) links in E2 — Ei. This corresponds to 
the max-neighborhood monopoly (resp., min-neighborhood monopoly) problem. 

We note that, if Ei = 0 , then the max-neighborhood monopoly problem 
is a maximum subgraph problem (or a minimum edge-deletion problem). On 
the other hand, if G2 is complete (i.e., G2 = Kn, where n = |V|), the min- 
neighborhood monopoly problem is a minimum edge-augmentation problem. 

Let us then consider the case in which the answer to the monopoly verification 
problem is NO. In this case, we want to compute a graph G in which M controls 
as many vertices in V as possible. 



Max Controlled Set 

Input: Two graphs Gi = (V,Ei) and G2 = (Y, A2) such that Ei C E2, 

and a vertex set M CV. 

Output: A graph G such that (I) Ei C E C E2 and ( 2 ) \Cont{G,M)\ > 
I Cont{G' , M)\ holds for all G' = (V, E') with Ei C E' C E2. 
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Section |2| studies the monopoly verification problem. We show that it is 
polynomially solvable by reducing it to the network flow problem. In Sections 0 
and 21 we consider the max- and min-neighborhood monopoly problems, respec- 
tively. Although both problems are more general than the monopoly verification 
problem, we show that they are also polynomially solvable. Finally Section 0 
investigates the max controlled set problem. Contrary to the previous problems, 
it turns out to be intractable, even if either Gi is empty or G 2 is complete. We 
finally present a simple approximation algorithm which guarantees the approxi- 
mation ratio of 2. 

For space reasons, proofs of some results are omitted. 

2 Monopoly Verification Problem 

For a graph G = {V, E) and A,BCV, let E{A, B) = {A x B) H E = {(z;, w) G 
E \ v G A,w G B}. If A = {z;} (resp., B = {zc}), then we simply write E{v,B) 
(resp., E{A,w)) instead of E{A,B). 

Suppose that M is a monopoly in a graph G = {V, E). Let U = V \ M and 
D — E 2 \Ei. We consider the following two modifications on E; (1) adding edges 
in D{M,M) = {M x M) (1 D to E and (2) deleting edges in D{U, U) from E. 
Since these modifications do not affect the condition that M be a monopoly in 
the graph G, we can assume in this section that the edge set E satisfies 

A D U D{M,M), (2) 

ECEi U D{M,M) U D{U,M). (3) 

For a vertex v G U, let 

deficit{v) = {v) fl C/| — | A^Gi (v) fl M\. (4) 

By definition, v is controlled by M in Gi if and only if deficit{v) < 0. Let C/> 
and U< be the sets of vertices v G U such that deficit{v) > 0 and deficit {v) < 0, 
respectively. Then v G U< is controlled by M in any graph G with E (A Ei) 
that satisfies property Thus, we can restrict E to 

E{U<,M) = EfiU<,M) (i.e., A C UD(M,M) U !?([/>, M)). (5) 

Let G'*' = (y, El U D{M, M)). For a vertex v G M, let 

surplus(v) = \Nq+{v) n M\ — |A"g+('*^) C U\. (6) 

By property (j2|), surplus (v) represents an upper bound on the number of 
edges (v,w) G D{v,Uy) that can be added to Gi. If surplus{v) < 0 holds for 
some V, we can see that v is not controlled by M in any graph G = {V, E) with 
El C E C E 2 - We thus assume that all vertices v G M satisfy surplus (v) > 0. 
We now define a network N = (G* = {V* , E*), c: E* ^ IR^) by 



y* = [/> UMU{s,t}, 

£;* = A,UAtUG([/>,M), 
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where Eg = {(s,z;) | v G M} and Et = {{w,t) \ w G J7>}, and a capacity function 

{ surplus (v) if e = (s,v) G Eg, 
deficit{w) if e = (w,t) G Et, 

1 if e = (u, w) G £>([/>, M), 

For example, let us consider the problem instance given in Figure E We can 
see that the corresponding network N is represented by Figure El 



M U{^V\M) 




Fig. 1. Two graphs Gi = (V,Ei) and 
{p, q, r, u, V, w} and M — {p, q, r}. 



M U(=V\M) 




= (V, E2) with El C E2, where V = 



Et 




Fig. 2. The network N = {G* = {V* ,E*),c : E* 1 — associated with Gi, G 2 and 
M in Figure 0 



The following lemma shows our problem can be reduced to a network flow 
problem in G*. 

Lemma 1. There exists a graph G = {V,E) such that (1) E\ G E G E 2 and (2) 
M is a monopoly in G, if and only if the network N has a maximum s-t flow 
whose size is X)iuGf 7 > deficit (w). 
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Proof. Since t) = deficit{w), no s-t flow has size greater 

than X^u)G(7> deficit (w). Let us assume first that network N has a maximum s-t 
flow whose size is deficit {w). Since c(e), e £ E*, is an integer, N has an 

integral maximum s-t flow /, i.e., for each e £ E* , /(e) is an integer. Let 

E = Ei\J D{M,M) U {e G £»([/>, M) I /(e) = 1}. 

Since / eliminates all deficits, we can see that M is a monopoly in the graph 
G={V,E). 

On the other hand, let us assume that M is a monopoly in the graph G = 
{V,E) with El C E C E2- For each w £ tf>, we arbitrarily choose deficit{w) 
edges from E{w,M) — Ei{w,M), and let A{w) be the set of deficit{w) such 
edges. Note that M is a monopoly in G' = {V, E'), where E' = EiU D{M, M) U 
A{w). We now assign nonnegative integers to each e £ E* = EgiJ EtiJ 
D{Uy,M) as follows. 



r c*(e) if e = (s,z;) G Eg, 

_ I deficit{w) if e = {w,t) £ Et, 

ife=lv,w) £ D{M,U>)nA{w), 

[ 0 if e = (u, w) G D(M, [/>) \ A{w). 

where c*(e) = |{(u,ry) G A{w) \ w £ C/>}| for e = (s, u) G Eg. We can see that 
this / is an s-t flow and its size is X)ujgc/> deficitfw). □ 

For example. Figure |3 shows a maximum flow in the network N given in 
Figure 12 This flow corresponds to the graph G in Figure 0 

Let us note that the size of the network N satisfies |T^* | < n+2, \E* \ < n-\-rri 2 
and maxc(e) < n, where n = |y| and m2 = \E2\- Since a maximum flow on such 
a network can be computed in 0(min{(n + 777,2)^/^, n^/^(n + m2)}) time P(1 . we 
have the following result. 

Theorem 1. The monopoly verification problem can be solved in 0(min{(n + 
m2)^/^, n^/^(n + m2)}) time. □ 

3 Max-Neighborhood Monopoly Problem 

In this section, we consider the max-neighborhood monopoly problem. This sec- 
tion always assumes that the answer to the monopoly verification problem is 
“Yes”, i.e., there exists a graph G = (V,E) such that Ei C E C E 2 and M is 
a monopoly in G. We show that the max-neighborhood monopoly problem can 
be solved in polynomial time by solving a maximum weighted matching in an 
associated graph. 

Recall the definitions, U = V \ M and D — E 2 \ E\. Let G = (V, E) be a 
solution to the max-neighborhood monopoly problem, where E = E\ U A with 

^ 0 {) notation is similar to usual 0() notation except that 0 {) ignores logarithmic 
factors. 
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flow value/capacity 



Es 



Et 




M U{=V\M) 




Fig. 3. A maximum flow in the network N given in Figure Inland the corresponding 
graph G. 



A C D. By the maximality of A, A clearly contains D{M, M). Let C/> and U< 
be as defined in the previous section. For a vertex v G U (resp., v G M), define 
deficit{v) (resp., surplus(v)) by 0 (resp., ®). Further, for each v G M, we 
define the “usable surplus,” 

surplus* {v) = mm{ surplus (v), \D{v,U)\}. (7) 

By our assumption, surplus*{v) > 0 and \A{v,U)\ = surplus*{v) holds for each 
V G M. For every maximum A, |L\(M, M GU) \ is of the same size, and hence a 
maximum A contains a maximum A(U,U). 

We now associate a graph G* = {V*,E*) with Gi = (V,Ei), G 2 = {V,E 2 ) 
and M CV: 

F* = Fi U Fa U F3 U F4, 

E*= U 

(v,w)£D{MUU,U) 

where 

F. {Ul , V2 , . . . , U surplus* (v) \ U G Af } , 
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^2 {^1 5 ^2 ; • • • ; ^1 deficit(v)\ | ^ ^ i 

Vs = {Xel, ®e2, SeS, a;e4 | 6 G D{M, U)} , 

Vi = {z<v,m>,Zcn,,v> \ (v, w) G D{U, U)}, 



'{{Vi,Xel), {Xee,Xe(e+l)), {Xe2,Wj) \ Vi G Vi , Wj G V2,l = 1,2,3} 

if e = {v, w) G D{M, C/>), 
{{vi,Xei), {xee,Xi,(e+i)) I G li,^ = 1,2,3} if e = (v,w) G D(M, U<), 
{(a;e4,2:<„.™>), («<u,m>, 2 <m.„>), , »e '4 ) | 6 G D{M,v),e' G D(M,w)} 

E(^,w) = <( if (v, w) G D{U>,U>), 

{(^e4,2^<i;,ij,>), ('^2,2^<‘U,ii>>), (•2'<'u;,Tj>,Xe/4) | C G D(^A1 ^ V^ , 

e' G D{M,w),Vi G Vi} if (v, u;) G D{U<,U>), 

{(Xe4, -2 ^<i;,'U,>)5 (^*, ) , ('2^< 

, I e G -D(M, ii), e' G H(M, w),Vi,Wj G li} if (i>, ui) G D{U<,U<). 



Here we assume that z^y^w> ^ z<iw,v>- Moreover, let us define a function weight : 
E* ^ R+ by 



{ 4L if e* = (a:ei, Xe 2 ), 

L if e* = (xes, *64), 

31/ if e* {xei,Xe 2 ),{xe 3 ,x^i) and e* G (n, w) G D{M,U), (8) 

3 ife — {z^v,w> ^ 

2 otherwise. 



where 



L > weight (Eg). (9) 

eeD{U,U) 

Here weight(T) = weight (e*) for a set T C if*. 

Note that every if(„_i„) forms a tree, and satisfies H if(«/_u,') = 0 if 

(u,w) yf {v',w'). By (0 and (0, 

weight {e*) > E weight{E(y^y,)) (10) 

iv,w)eD{U,U) 

holds for each e* G with (u,ic) G D{M,U). 

We now show that a maximum weighted matching S in G* corresponds to a 
desired graph G. Let Weight denote the weight of a maximum weighted matching 
S, i.e.. Weight = weight{S). Let O — Oi + 02, where 

01 = L surplus* {v) + L deficit{v) + 5L|H(M, U)\ (11) 

ugm veu> 

02 = ^\D{U,U)\ + \A{U,U)\. (12) 

(Recall that G = {V, E = Ei U A) is & desired graph.) 



Lemma 2. Weight > 0 holds. 
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Proof. Let us construct, from Z\, a matching T of G* with weight weight (T) = 0. 
We first introduce mappings a and j3 for each edge {v, w) G L\(M, U). 

Let a be an arbitrary one-to-one mapping from A{M, U) to Vi such that 
a{v,w) = Vi for some i = 1,2, surplus* {v). Since every v € M satisfies 
\A{v, U)\ = surplus*{v), a is well-defined. Let V = {wi,W 2 , ■ ■ ■ , Wdeficit(w) \ w G 
[/>} and V" = {xez \ e G D{M,U)}. Let (3 be an arbitrary injective mapping 
from A{M, U) to V U V such that (i) either j3{v, w) = Wi or holds 

for every {v,w) G A{M,U), and (ii) for every Wi G V , there exists an edge 
{v, w) G A{M, U) such that j3{v, w) = Wi. Here a mapping y is called injective if 
x{p) x{q) holds for p ^ q. Since M is a monopoly in G, /3 is also well-defined. 
These a and f3 show how to allocate the surplus on M to the deficit on U, where 
/3{v,w) = X(„_uj )3 means that (v,w) produces the surplus on w. 

Similarly, for each w € U, let be an injective mapping from A{w, U) 

to {wi,W 2 ,.-.,w_deficit(w)}^ = X(„,„,) 3 }, where {wi, 1 ^ 2 , . . . , 

W-deficit{w)} = 0 if deficit(w) > 0. Intuitively, ^w{w,u) = Wi means that the 
surplus on w is used to add {w, u) to Gi. On the other hand, jw(w, u) = 
means that the surplus on v which is transferred through the edge (v, w) to w 
is used to add {w,u) to Gi. 

From these mappings, we define a matching T in G* . For a vertex e = {v, w) G 
D{M, U), define a set Tg of edges in G* by 

{ {(a(e),Xei), (Xe2,/3(e)), (xe 3 ,Xe 4 )} if 6 G A{M , U) and /3(e) = Wi for some i, 
{(a(e),Xei), (a;e2,/3(e))} if e G Z\(M, [/) and /3(e) = Xes, (13) 

{(xei,Xe2),(xe3,Xe4)} Otherwise, 

and for an edge e = {w, u) G D{U, U), define a set Te of edges in G* by 



T, = 



/ {{.'Iwi.sj), z<^w,u>) , z<^u,iu>)} if e G A(U, U), 

1 {(^<™,u>),2<«.t«>)} otherwise. 



(14) 



Let T = UeGD(Muf 7 f 7 ) forms a matching in G* . Note 

that, for an edge e = (v,w) G D{M,U), weight{Te) = 7L,6L or 5L, which 
respectively correspond to the first, second, and third cases in (I I dll . Exactly 
(resp., surplus* (v)-J2ujeu^d,eficit{w) and \D{M, U)\- 

surplus* {v)) edges belong to the first case (resp., the second and third 
cases). Thus, 



weight{Tff) = 6>i. 

e£D(M,U) 



(15) 



For an edge e = {w, u) G D(U, U), weight{Te) = 4 or 3, which respectively corre- 
spond to the first and second cases in dEJ. Exactly \A{U,U)\ (resp., |Z3(f7, f7)|- 
|Z\(f7, f7)|) edges belong to the first case (resp., the second case). Thus, 



weight{Te) = 02, 

eeD{U,U) 



(16) 



which together with (I I hi) implies weight (T) = 0. This completes the proof. □ 
To show the opposite inequality, let us show the following lemma. 
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Lemma 3. Let T be a matehing of G* sueh that weight (T) > 0\. Then (ESJ 
holds. 

Proof. For each edge e G D{M,U), Eg. forms a tree, and is a matching in 
Eg, where Tg = T C\ Eg. The weight of a maximum matching in is 5L 

when surplus* {v) = 0. When surplus* (v) > 0, the weight is 7L if defieit{w) > 
0, and 6L, otherwise. However, since T is a matching, (1) among those with 
surplus* (v) > 0 and deficit{w) > 0, at most edges can 

have weight 7L, and (2) among those with surplus* (v) > 0, at most 
surplus* edges can have weight at least 6L. Since the weight of Tg is 

at least 5L, we have 



weight{Tg) < L surplus* (v) + L defieit{v) + 5L\D{M, U)\. (17) 

e£D(M,U) v€M veU> 

Moreover, if XeGD(MC/) weight{Tg) < 6>i, then 

weightiTg) <0i — L. (18) 

e£D(M,U) 

From (da, this implies (H3. □ 

Let T be a matching in G* such that weight{T) > 0i. Then the proof of 
Lemma 0 also shows that UeGD(Mif)^e gives a desirable A' in the sense that 
M is a monopoly in G = {V,E\ U D{M,M) U A'), where A' is obtained by 
reversing the construction of (tT^ . Moreover, this implies that UeG £>((7 u) gives 
a desirable A" (i.e., M is a monopoly in G = {V, EiLlD{M, M)UA'LIA'')), where 
A” is obtained by reversing the construction of da. More precisely, e € A” if 
and only if weight{Tg) = 4. We therefore have the following lemma: 

Lemma 4. Weight < 0 holds. □ 

From LemmasQand0 we obtain an interesting characterization of the max- 
neighborhood monopoly problem. 

Corollary 1. Weight = 0 holds. □ 

Let us note that the size of the graph G* satisfies |M*| = 0 ( 1712 ), |if*| = 
0 ( 1712 ) and max. weight (e*) = 0 ( 7712 ), where 1712 = |£' 2 |- Since a maximum 
weighted matching on such a graph can be computed in 0 (rri 2 ^^) time 0, we 
have the following theorem. 



Theorem 2. The max-neighborhood monopoly problem ean be solved in 0(m^^'^) 
time. □ 
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4 Min-Neighborhood Monopoly Problem 

In this section, we consider the min-neighborhood monopoly problem. As in 
Section 0 we assume in this section that there exists a graph G = (V, E) such 
that El C E C E 2 and M is a monopoly in G. 

Recall the definition, U = V \ M. For a vertex v £ V, let 

surplus{v) = |IV( 3 i(f) n M| — |IV( 3 i(u) n?7|, (19) 

and let deficit{v) = —surplus{v). Note that these definitions are different from 
those in the previous sections. By definition, v is controlled by M in G\ if and 
only if surplus{v) > 0 (i.e., deficit{v) < 0). Let M_, Mq and M+ be the sets of 
vertices v € M such that surplus (v) < 0, surplus (v) = 0 and surplus (v) > 0, 
respectively, and let [/_, Uq and U+ be the sets of vertices v ^ M such that 
surplus{v) < 0, surplus{v) = 0 and surplus{v) > 0, respectively. By definition, 
M = M_ U Mo U M+ and [/ = C/_ U Uo U U+. 

Let G = {V,E) he & solution to the min-neighborhood monopoly problem, 
and let E = EiiJ A, where A C D = E 2 \E\. Since minimizing E is clearly 
equivalent to minimizing A, we discuss properties of A instead of those of E. 
Based on discussions in Section |2| without loss of generality, we can assume 

AC D{M,M)LID{U-,M). (20) 

The following lemma is easy to prove: 

Lemma 5. For each v £ U-, 

\A{v, M)\ = deficit{v). (21) 

The following corollary is a direct consequence of the above lemma. 

Corollary 2. \A{U-,M)\ = de/jczt(u). □ 

Corollary |3 implies that, for every minimum A, \A{U-, M)\ is of the same 
size, and hence A contains a minimum A{M,M). 

We now associate a graph G* = (V*,E*) with Gi = (V,Ei), G 2 = (V,E 2 ) 
and M CV as follows: 

y* = Cl U R 2 U V 3 U C 4 U V 5 , 

E* = [J U [J EyUEa, 

veU-,weM 

where 

Vl = {vi,V2, ■ ■ ■,Vaeficit(v) I V £ M-}, 

F2 = {Vl,W 2 , ■ • ■,Vdeficit(v) I V £ U-}, 

L3 {'Cl , V 2 , . . . , U I C £ { , 

L4 (v ,w) : y(v ,w) I (c,'ic) £ L^(t/— ,A/){, 
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Vs = {z<v,m>, I (v,w) € D(M,M)}, 

-^(v,w) {(Vi,X( 

V,W )),(*(« ,tu) 1 y(v,w )).(s/( v,w 

{yiv,w),Wj)\vi G V2,Wj G V 3 ,z<cn,,u> G Y5} for (i!,w) G D{U-,M+), 

^{v,w) ^ ^ (v ,'w)') 1 1 y(v,vj)}j (y(v,w) j Z^uj,u> ) | 

Vi G V2,z<u,,u> G Vs} for (u,u;) G D{U-,MqU M-), 

Ev = {{vi,z<v,w>)\vi eVi,{v,w) e D{v,M)} for v G M_, 

Ea ~ {{z<Cv,w> ^ ) | Z<:iv,w> ; Z<Cw,v> €Vs}. 



Here we assume that S(^^u,) = S(uj,i;) for s = x,y, and z<v,vj> ^ z<:w,v>- We also 
define a function, weight : E* i-4 by 

r 1 ife€Ea, 
weight (e) = < 3L if e G if;,, 

[ 2 L otherwise, 

where E}, = {(a:(„ w),y(v u;)) \x{v w),y{v w) S V4} and L is a number greater than 

\Ea\. 

Let S' be a maximum weighted matching in G*, and let Weight denote its 
weight; i.e.. Weight = weight{S), where weight(T) = weight {e) for a set 

T C E*. 

Although the proof is skipped due to the space limitation, we can show that 
computing a maximum weighted matching in G* is polynomially equivalent to 
the min-neighborhood monopoly problem. 

Lemma 6 . Let Weight and L be as defined above, and let A be a minimum edge 
set added to Gi so that M is a monopoly in G = (V, EiU A). Then 

Weight - 0 = \D{M,M) \ - \A{M,M)\ 

holds, where 

0 = 2 iL\D{U-, M)\ + L deficit{v)+ 2 L defcit{v). (22) 
veil- i;GM_ 

Let us note that the size of the graph G* satisfies |fo*| = 0 (m 2 ), \E*\ = 
0(7712) and max. weight (e*) = 0(7772), where m2 = |£'2|- Since a maximum 
weighted matching on such a graph can be computed in 0(7772^^) time 0 |, we 
have the following theorem. 

Theorem 3 . The min-neighborhood monopoly problem can be solved in 0(7772' ) 
time. □ 

5 Max Controlled Set Problem 

Let us finally consider the max controlled set problem. Unfortunately, this prob- 
lem is intractable, even if we restrict ourselves to the edge-augmentation and the 
edge-deletion problems. 
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Theorem 4. The max controlled set problem is NP-hard, even if Gi is empty 
or even if G 2 is a complete graph. □ 

Since the max controlled set problem seems to be intractable, we consider an 
approximation algorithm. We present a simple approximation algorithm which 
guarantees an approximation ratio of 2. 

For two graphs G\ = (V,Ei) and G 2 = {¥,£ 2 ), and a set M C V, we 
construct two graphs G~^ = (V, if U E~^) and G++ = (V, if U if++) for Gi and 
G 2 by 



E+ = EiUD{M,M), and 
E++ = El U D{M, M) U D{U, M), 

respectively. Here U = V \ M. Let W~^ and be the sets of vertices in V 

which are controlled by M in G+ and G++, respectively. The following lemma 
is immediate from the definitions of G+ and G^~^ . 

Lemma 7. Let and be as defined above. Let be a family of sets 

W C V which are controllable by M in some graph G = (V, E) with ifi C if C 
E 2 . Then we have 



|w+ nM| 
\w++nu\ 



max iWnML 
W€W 



max \ W n GI. 
wew 



Lemma 8. Let W~^ and be as defined above. Let W* be the larger of the 

two, i.e., \W*\ = max{|Hf+|, |bF++|}. ThenW* satisfies 

|VF*| > 1/2 max IWI. 
wew 

Proof It follows from LemmaQthat maxiygyy \W\ < \W^\ + |W++| < 2|1F*|. 

□ 

Theorem 5. Given two graphs Gi = (V,ifi), G 2 = (V,if 2 ), and a set M CV, 
we can compute in polynomial time a graph G = (V,E) with E\ Q E Q E 2 such 
that the size of the set controlled by M in G is at least half of that of a maximum 
controlled set. □ 

6 Conclusion 

This paper discussed edge augmentation and deletion problems when the number 
of vertices controlled by a given set M of vertices is held at maximum. These 
problems were shown to be NP-complete in general, by a transformation from the 
maximum independent set problem. However, it can be determined in polynomial 
time if the addition (or deletion) of a set of edges can make M control all vertices, 
by reducing it to a network flow problem. 
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One can easily extend the positive results in the following way. For a function 
/ on V", a vertex u G 1/ is called f -controlled if |A^G(v)nM| — |iV( 5 (w)\M| > f{v). 
Then the corresponding problems to monopoly verification, max-neighborhood 
monopoly and min-neighborhood monopoly problems can be solved in poly- 
nomial time by applying the network flow and matching arguments to them, 
respectively, and the approximation argument also holds for the max /-cotrolled 
set problem. However, the NP-hardness result does not hold for every function 
/. For example, if f{v) = \V\ for all v G V, then the max /-cotrolled set problem 
is polynomially solvable. 

Some problems remain to be addressed in further work. One issue is the 
search for faster or simpler algorithms for our problems. Another issue is to 
consider max controlled set problem for special classes of graphs. 

Acknowledgments. We thank the referees for their helpful suggestions. 
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Abstract. System level fault diagnosis deals with the problem of iden- 
tifying component failures in a multiprocessor system. Each processor is 
either faulty or fault-free, and the objective is to hnd out the fault sta- 
tus of each processor in the network by letting the processors test each 
other. A test of a processor by another processor is possible if they are 
connected in the system. If the tester itself is fault-free, it always reports 
the fault status of the testee, but if the tester is faulty, the result of the 
test cannot be trusted. 

We show that for the hypercube multiprocessor system of dimension n, 
in which at most n processors are faulty, adaptive diagnosis is possible 
using at most 2" -|- n — 1 tests, which improves earlier bounds and is 
optimal. We also present an algorithm which diagnoses the hypercube in 
4 testing rounds, where each processor is scheduled for at most one test 
of each round. 



1 Introduction 

In the fault diagnosis model of Preparata, Metze, and Chien m, each processor 
is either faulty or fault-free. The problem is to locate all faulty processors in the 
system by letting the processors test each other. A test of a processor v (the 
testee) by another processor u (the tester) denoted (u,v) is possible iff u and v 
are connected in the system. The outcome of a test is 0(1) if the tester diagnosed 
the testee as fault-free(faulty). If the tester itself is fault-free it always reports 
the fault status of the testee, but if the tester is faulty, the result of the test 
cannot be trusted. In adaptive diagnosis, you may choose what tests to be made 
based on the outcomes of previous tests. Furthermore, it is assumed that the 
fault-status of a processor does not change during the diagnosis. 

We show that for the hypercube multiprocessor system of dimension n, in 
which at most n processors are faulty, adaptive diagnosis is possible using at 
most 2" -b n — 1 tests, which is optimal. shows the best previous bound of 
2" -b ^ tests. We also present an algorithm which diagnoses the hypercube in 
4 testing rounds, where each processor is scheduled for at most one test of each 
round. This result improves on the 11 round algorithm in 0. 
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2 Historical Notes 

The problem of fault diagnosis of multiprocessor systems was originated by 
Preparata, Metze and Chien |]^. They show that sometimes it is not possi- 
ble to diagnose a multiprocessor system even though all possible tests are made. 
Two necessary conditions for a system to be diagnosable is pointed out. The first 
one is that a strict majority of the processors must be fault-free, and the second 
one says that there may not be more faulty processors than the minimum degree 
of the system. The degree of a processor is the number of processors connected 
to it, and the minimum degree of a system is the minimum of the degrees of all 
its processors. 

The early research focused on non-adaptive diagnosis, i.e. you must decide 
exactly which tests you will make prior to the diagnosis. Nakajima 0 is the first 
to propose adaptive diagnosis, in which you may choose your tests depending 
on the results of previous tests. This method increases the efficiency of fault 
diagnosis drastically. m shows that a complete system on n processors, i.e. all 
pairs of processors are connected, where at most k processors are faulty, requires 
kn tests in the non-adaptive case. In contrast, Blecher ^ shows that n + k — 1 
tests are necessary and sufficient for the adaptive diagnosis of the same system. 
Note that this lower bound applies to all systems, since by removing connections 
you will not add any testing possibilities. The question of whether this bound 
is tight for some sparse systems as well is partially answered to the affirmative 
in 0, in which some simple systems with at most a constant number of faulty 
processors are considered. 

Various kinds of fault diagnosis of the hypercube system are studied in P, 
0, and p. The first results on adaptive diagnosis of the hypercube is obtained 
by Feng et al. |0|. They present an algorithm using at most 2"([lognJ -1-2) tests, 
embedded in at most n -I- 4 testing rounds. Kranakis and Pelc jS| shows that 
2 « - 1 - to ^gg|;g sufficient, and that adaptive diagnosis can be carried out in 
a constant number of rounds . We improve on their results by showing that 
2"- -I- n — 1 tests are always enough, and 4 testing rounds suffice. Our methods 
differ from the ones in p. 

3 Preliminary Definitions 

In the following the hypercube system of processors will be modelled by an undi- 
rected graph in which the nodes represent processors, and the edges represent 
connections between pairs of processors. Let P be the graph consisting of two 
connected nodes, then the hypercube of dimension n, denoted is defined as 
the cartesian product = P x where Hi = P. From this definition, it 

is clear that the hypercube has 2" nodes and that it can be divided into two 
identical hypercubes isomorphic to Ffn-i, let us call them A and B, s.t. every 
node in A is connected to exactly one node in B, and vice versa. We say that 
(A, B) is a mirror decomposition of Hn for any such partitioning, and define the 
corresponding mirror function A2B mapping every node of A to its neighbour in 
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B. Analogously, B2A is defined as the inverse function. For any set S of nodes, 
we let p{S) be the number of faulty nodes contained in S. We know the correct- 
ness of a set K of nodes, if for all nodes in K we know whether they are faulty 
or not. 



4 Adaptive Fault Diagnosis in 2"^ + n — 1 Tests 

The lower bound of 2" + n — 1 tests in P] for the complete graph on 2" nodes 
with at most n faulty ones, applies to the hypercube as well. We show that 
this bound is tight for the hypercube. 

4.1 Some Definitions and Lemmas 

An honest trail T = (ui,U 2 , iu a graph is a multiple of distinct connected 

nodes along with a collection of test results, {vi, Ui+i) = 0 for all * = 1,2, .., m— 1. 



Lemma 1. Let G = (V,E) be a graph with p{V) < k, and T = {v\,V 2 , ■n^m) be 
an honest trail in G. If p{V \ T) = w, then all Vi are fault-free for i > k — w. 

Proof. The proof is by contradiction. Assume Vj faulty, then Vj-\ is faulty as 
well, since the latter node has diagnosed Vj as fault-free. Repeating the argument, 
we conclude that all vi for I < j are faulty, but we know that there is at most a 
total of k faulty nodes in the graph, so j -\- w < k. Thus, Vi cannot be faulty for 
i > k — w. □ 

Corollary 1. The last node Vk+i of an honest trail T = (ui, U 2 , .., u^+i) in a 
graph with at most k faulty nodes, is fault-free. 

A vertex cut C of a connected graph G = {V, E) is a subset of V s.t. the graph 
induced by removing C from G is no longer connected. The size of the smallest 
vertex cut is called the connectivity of the graph. It is a well known fact that 
the hypercube of dimension n > 2 has connectivity n. 

Lemma 2. Let G = (V,E) be a graph with connectivity k and p{V) < k. Sup- 
pose we know the correctness of a subset K of the nodes V , and at least one 
of them is fault-free, and we also know an honest trail T = {v\,V 2 , ..,Vm), s.t. 
T Ci K = %, then 

— In at most \V\ — \K\ — |T| -|- min{\T\, p{V \ K) -\-\,k — p{K)) tests, we can 
diagnose all of G . 

— If in addition, we know a node w ^ K UT s.t. (vm,w) = 1, we can diagnose 
all of G in at most \V\ — \K\ — |T| -|- min{\T\, p{V \ K), k — p{K) — 1) tests. 

Proof. If |r| > A: — p{K), then by lemma 1 we know that the last nodes on the 
trail must be fault-free. Remove these from the trail and put them into K (since 
they are correctly diagnosed). Thereby we have assured that \T\ -\- p{K) < k. 
As long as K V, we wish to extend the set K of diagnosed nodes. It can be 
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accomplished by repeating the following procedure until the correctness of all 
nodes are learned. If p{K) < k, we know that there must be a fault-free node u 
in K adjacent to some node u in (F \ (it! U T)) U {ui}. This is because the set 
consisting of all nodes in T except v\ and the p{K) faulty nodes in K, cannot be 
a vertex cut simply because its size is less than the connectivity of G. Make the 
test {u, v) to diagnose v and put it into K. We distinguish between two cases. 
If and the outcome of the test was 1, we remove V\ from the trail T and 

rename the remaining nodes of the trail so that the node formerly known as Vi+i 
now is called Vi for 1 < i < m — 1. If v = vi was found fault-free though, all 
nodes in T are removed and put into K, since they all must be fault-free, leaving 
an empty trail. In the second case, when v ^ v\, and if the result of the test was 
1, and \T\ + p{K) = k, the last node of the trail must be fault-free according 
to lemma 1, and therefore it too is removed from T and put into K. If we ever 
get p{K) = k, we immediately know that all nodes in y \ A" must be fault-free. 
Either way, eventually K = V, and we are done. 

The tests made are easily counted, since every node not initially in K is 
tested at most once, and the nodes on the original trail T are tested in order, i.e. 
Vi is tested before Vj for i < j, and we stop testing the nodes on the trail either 
when we find a fault-free one, or all turned out to be faulty. If only I < m of the 
nodes on the original trail T are faulty, the last nodes Vi for i > I + 1 are not 
tested at all since they must be fault-free if vi+i is. This also proves the small 
improvement in the number of tests in the second statement of the lemma, since 
in this case, when only I < m nodes on the trail are faulty, w must be faulty. On 
the other hand, if w was diagnosed fault-free, then all Vi G T must be faulty. □ 



4.2 The Main Result 

Our proof that a hypercube graph of dimension n > 3 with at most n faulty 
nodes can be adaptively diagnosed in at most 2"-|-n — 1 tests, is by induction. In 
fact, we show that something slightly stronger holds. First, we show the result 
for the hypercube of dimension 2. 

Lemma 3. The hypercube of dimension 2 with at most 2 faulty nodes, either 
cannot he diagnosed at all, and this can he established in 4 tests, or it is diagnosed 
in 4 tests when no nodes are faulty, or at most 5 tests when some nodes are faulty. 

Proof. The proof is by extensive case analysis. The hypercube of dimension 2 
is a ring of four nodes. Call the nodes on the ring ri; 0 < i < 4, where is 
connected to ri+imodi- Make all four tests along one direction of the ring, i.e. 
(ri, ri+i mod 4 ); 0 < z < 4. If all four tests resulted in 1, the ring cannot be 
diagnosed, since either tq and r 2 are faulty (ri and r^ are fault-free) or ri and 
C 3 are faulty (rg and r 2 are fault-free). Still, note that it is sufficient to establish 
the correctness of just any one of the four nodes in order to diagnose all of them. 
If all tests on the ring resulted in 0 though, all four nodes must be fault-free and 
we are done. If exactly one test resulted in 1, assume w.l.o.g. (ro,ri) = 1 and 
note that tq and r^ are fault-free and ri is faulty, so just use r^ to diagnose T 2 
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and we are done. If exactly two tests resulted in 1, there are two cases. Either 
two tests next to each other on the ring, or two node disjoint tests resulted in 
1. Assume in the first case w.l.o.g. that (ro,ri) = (ri,r 2 ) = 1 and note that tq 
and T 3 are fault-free and ri is faulty, so just use to diagnose T 2 and we are 
done. In the second case, assume w.l.o.g. (ro,ri) = {r 2 ,r^) = 1 and note that tq 
and T 2 are fault-free, whereas r\ and are faulty. Finally, if exactly three tests 
resulted in 1, assume w.l.o.g. (ro,ri) = 0 and note that ri is fault-free and V 2 is 
faulty. Make the test (ri,ro). If the result was 0, Tq is fault-free and faulty, 

otherwise is fault-free and rg is faulty. Either way, we are done. □ 

The previous lemma provides the fundamental brick in our inductive proof. 

Theorem 1. The hypercuhe Hn of dimension n > 3 with at most n faulty nodes 
can he adaptively diagnosed in at most 2” -|- n — 1 tests if precisely n nodes are 
faulty, and in 2'^ + I tests if I < n nodes are faulty. 

Proof. Let (A, B) be a mirror decomposition of Hn. Since the hypercube is 
hamiltonian, it is possible to find a node disjoint path p = (vi,V 2 , ..,w„+i) in A 
when n > 3. Make the tests {vi, along the path for i = 1,2, .. until either a 
test resulted in 1, or all tests resulted in 0. In the first case, there is an to < n 
s.t. {vm,Vm+i) = 1 and (uj,Wi_|_i) = 0 for I < z < TO. Thus, there must be at 
least one faulty node in A (one of Vm and Vm+i), and therefore, there are at 
most n — 1 faulty nodes in B. Diagnose B, using the induction thesis, or lemma 
3 in the event of n = 3. If B could not be diagnosed in the latter case, we 
immediately know at least two fault-free nodes in A since B contains two faulty 
nodes according to the proof of lemma 3. Use one of these to diagnose a node in 
B and thereby gain complete knowledge of the correctness of all four nodes in 
B. Finally, let a fault-free node in A test one of Vm and Vm+i in order to find out 
which of them is faulty. At most nine tests are made this way and three faults 
are located, which is consistent with the theorem. If B can be diagnosed though, 
the induction thesis for n > 3, and lemma 3 for rz = 3, assures that at most 
2"“^ -I- p{B) tests were made diagnosing B. If precisely n — 1 faulty nodes were 
found in B, we once again only need to find out which one of the nodes Vm and 
Vm+i is faulty, which clearly can be made in less than 2”-|-n— 1 tests for n > 3. If 
fewer than n— I faults were found though, apply lemma 2 with T = (yi, V 2 , Vm) 
and K = B, and note that the assumptions of the second statement of lemma 
2 are fulfilled since {v^mOm+i) = 1- Thus the number of tests made is at most 
2” -I- p{V) when p{V) < n, and 2” -|- n — 1, when p{V) = n as claimed. One case 
remains to be proven, namely when all the initial n tests along p returned 0. 
But according to corollary 1, Vn+i in this case must be fault-free. Apply lemma 
2 with T = {v\,V 2 , ..,Vn) and K = {u„_|_i}. The first statement of the lemma 
assures that at most 2" -|- p{V) tests are made when p{V) < n, and 2” -|- n — 1 
tests when p{V) = n. □ 

5 Adaptive Fault Diagnosis in 4 Rounds 

The efficiency of adaptive fault diagnosis can of course be measured counting 
the number of tests needed as in the previous section. For practical purposes. 
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this is a poor measurement though. A more adequate one, due to the parallel 
implementation possibilities, is the number of testing rounds needed to diagnose 
the graph. In each round, every node is allowed to participate in at most one 
test, either as a tester or a testee. In (2j, it was shown that the complete graph 
with a majority of fault-free nodes, can be diagnosed in 10 rounds. |H1 showed 
that for the hypercube of dimension n, with at most n faulty nodes, 11 rounds 
suffice. The best lower bound, to our knowledge, is that 2 rounds is not enough. 
The number of tests made after 2 rounds is at most 2", which violates the lower 
bound of 2” -I- n — 1 tests for n > 1. However, we present a diagnosing scheme 
in 4 rounds. 



5.1 Some Definitions and a Lemma 

Two arcs a = (ui,U 2 ) and b = (u3,U4) are said to overlap if ut = Uj for some 
1 < j < 2, 3 < j < 4. A S-round testing scheme of a graph G = {V, E) is a tuple 
(T, i?) where T = {V,A) is a directed subgraph of G, and i? : A — >• {1..3} is 
an arc colouring function s.t. for all distinct pairs of overlapping arcs a,b G A, 
i?(o) ^ R{b). Define the ith round of (T,R) as A^ = {o|a £ A,R{a) = i}. 
Let H = (Vi,Ei) be a subgraph of G, then T\E[ is the subgraph of T on Vi 
containing those arcs (u,v) £ A s.t. {u,v) £ Ei. We say that a 3-round testing 
scheme (Tn,R) for Hn is recursively hamiltonian for n > 2 if 

— Tn contains a hamiltonian cycle G C A for 

— There are at least two distinct arcs ai, 02 £ A2 fl G s.t. no arc in A3 overlap 
Oi or 02. 

— If n > 2, there is a mirror decomposition (A,B) of s.t. (T„|A, i?) and 
(T„|i?, i?) are recursively hamiltonian. 

The first and third property ensure the existences of hamiltonian cycles which 
we will use in our algorithm. The second property is merely added to show the 
existence of a 3-round testing scheme having the other two. 

Lemma 4. There is a recursively hamiltonian S-round testing scheme (Tn,R) 
for all hypercubes Hn for n>2. 

Proof. For H 2 , we let T 2 consist of a directed cycle of arcs a 1, 02, 03, 04 on its 4 
nodes. We define i?(ai) = R{af) = 1, and i?(o2) = i?(o4) = 2. It is easy to verify 
that the three properties of the definition for recursively hamiltonian holds for 
this construction. We proceed by induction on n. Assume there is a recursively 
hamiltonian 3-round testing scheme Sn-i = (T„_i,i?) for and construct 

one Sn = (Tn,R) for Hn as follows. Let {A, B) be a mirror decomposition of iL„. 
Use the description of Sn-i to build recursively hamiltonian 3-round schemes 
Sa = {Ta,R) for A, and Sb = (Tb,R) for B, s.t. if (rti,U2) is an arc in T^, 
(A2B{u2), A2B{ui)) is an arc in Tg. By the mirror symmetry, it is clear that 
if (vi,V 2 ) is an arc in Tb, {B2A(v2),B2A(vi)) is an arc in Ta- Furthermore, 
choose the colouring function R so that i?((ui, M2)) = R{{A2B{u2),A2B{ui))), 
for all arcs {ui,U 2 ) in Ta- In words, this means that we embed the recursively 
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hamiltonian 3-round scheme in A and B in opposite directions, but otherwise 
identically. The second property of the definition for recursively hamiltonian 
states that there are two arcs ai , 02 in Ta on a hamiltonian cycle of A, whose 
endpoints do not belong to any arcs a with R{a) = 3. Our embeddings Ta and 
Tb ensure that for oi = (^1,^2), 61 = {A 2 B{u 2 ),A 2 B{ui)) belongs to Tb, with 
i?(ai) = R{bi), and b± is an arc having the second property in the definition of 
recursively hamiltonian. Similarity, we can construct 62 from 02. Thus none of 
the nodes Ui,U 2 , A2B(u2), or A2B{ui) are part of an arc a with R{a) = 3, so 
we may add the arcs x\ = {ui, A2B{ui)) and X 2 = (A2B(u2),U2), and define 
R{xi) = R{x 2 ) = 3, to conclude that Tn = Ta + Tb + x\ + X 2 is hamiltonian. 
The arcs 02 and &2 obey the second property of recursively hamiltonian, and the 
third one follows from our inductive construction. □ 



5.2 The Main Result 

Our algorithm to diagnose the hypercube in 4 rounds is partially static, since 
the first three rounds are always the same. The tests of the fourth round though, 
may be completely different depending on the outcomes of the tests scheduled 
earlier. 

Theorem 2. The hypercube Hn for n > 3 can be adaptively diagnosed in 4 
testing rounds. 

Proof. Construct a recursively hamiltonian 3-round testing scheme = (T„, R) 
for Hn from lemma 4. Divide the arcs of T„ into its tth round components 
AJ’” = {a|a G = i} and make the tests corresponding to arcs in Af" 

in round i. To put it more formally, if {u\,U 2 ) G then the test (ui,U 2 ) is 
made in the ith testing round. By the definition of a 3-round testing scheme, 
no two tests in the same round use the same node. By the third property of the 
definition of recursively hamiltonian, we know there is a mirror decomposition 
(A,B) s.t. the tests made in A (B) form a recursively hamiltonian 3-round 
testing scheme for A (B). Thus the first property of the definition for recursively 
hamiltonian ensures that among the tests made, there are hamiltonian cycles of 
tests Ca in A, and Cb in B. If all tests along Ca (Cb) resulted in 0, we know 
from corollary 1 that all nodes along the cycle, i.e. all of A (B), are fault-free. In 
this event, assume w.l.o.g that Ca was fault-free, then we can schedule the tests 
{a, A2B{a)) for all nodes a in A in the fourth round. Since all nodes in A are 
fault-free, we know the correctness of both A and B after the fourth round. On 
the other hand, if there were tests Ia along Ca, and Ib along Cb both resulting 
in 1, we know there must be faulty nodes in both A and B. Hence there are at 
most n — 1 faulty nodes in each of A and B, and we can use recursion on A and 
B to find out which tests are to be carried out in round 4. The only problem left 
is the bottom of the recursion when A and B are of dimension 2. A case analysis 
very similar to the one in the proof of lemma 2, which is omitted, show us how 
to overcome this obstacle. □ 
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6 Conclusions 

We have shown that locating faulty processors in a multiprocessor system of 
2"’ processors, of which at most n are faulty, can be done as efficiently in the 
hypercube system as in the complete system. Still, removal of any connection 
between some pair of processors from the hypercube system leaves the system 
undiagnosable. By undiagnosable we mean that it may be that even if you make 
all possible tests in the system, there is some processor whose fault status you 
cannot decide. This is because the two processors at the end of the removed con- 
nection have lower degree than the number of possible faulty processors, which 
violates the second necessary condition for diagnosability in mentioned in 
the historical notes section. In this sense, the hypercube structure is optimal for 
the adaptive diagnosis problem. 

We also showed that it is possible to schedule tests in just 4 testing rounds, 
to adaptively diagnose the hypercube. For the complete system on 2" processors 
of which at most n are faulty, this can be strengthend to 3 rounds. Simply let 
the two first rounds constitute of a lot of cycles of length greater than n. Many 
of these will be found fault-free and can be used in the third round to diagnose 
the other cycles. It is still open whether 3 testing rounds is sufficient for the 
hypercube system. It should be noted that the number of tests in our 4 round 
testing scheme may be as many as 2"“''^, whereas the 11 round construction in 
0 uses at most 2" -|- (n -I- 1)^ tests. 
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Abstract. In this paper we construct sorting comparator networks 
which correct a fixed number t of faults in a sorted sequence of length 
N . We study two kinds of such networks. One construction yields a fault 
tolerant unit that attached at the end of any comparator sorting network 
makes the whole network a sorting one resistant to t passive faults. The 
second network can be used to ‘repair’ a sorted sequence in which at most 
t entries were changed (no fault tolerance is required). The new results of 
this paper are constructions of comparator networks of depth 1.44 -log A 
for these problems which is less than the depths of networks described by 
previous authors The construction of the networks is practical 

for small t. The numbers of comparators used by our networks are shown 
to be reducible to values optimal up to a constant factor. 



1 Introduction 

Sorting is one of the most fundamental problems of computer science. A classical 
approach to sort a sequence of keys is to apply a comparator network. Apart 
from a long tradition, comparator networks are particularly interesting due to 
potential hardware implementations. They can be also implementated as sorting 
algorithms for parallel computers. 

In our approach sorted elements are stored in registers ri, 7-2, . . . , r^y. Regis- 
ters can be indexed with integers or elements of other linearly ordered sets. In 
this paper a convenient convention is indexing registers with sequences of inte- 
gers X — {x\,X2, . ■ . ,Xk) ordered lexicographically. A set of all registers having 
the same first coordinate x\ is called row labeled xi. A set of all registers having 
the same all but first coordinates we call column labeled with the sequence of 
fixed coordinates. The first coordinate of a register we call its level in the column. 
We define operation o on sequences of integers 

{xi,...,Xk)o{yi,...,yi) = {xi,...,Xk,yi,...,yi). 

By I a; I we denote the length of x. 

A eomparator [i : j] is a simple device connecting registers and rj{i < j). 
It compares the numbers they contain and if the number in is bigger, it swaps 

* Partially supported by KBN grant 8 TllC 032 15 and by University of Wroclaw 
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them. The general problem is the following. At the beginning of the computations 
the input sequence of keys is placed in the registers. Our task is to sort the 
sequence of keys according to the linear order of register indexes applying a 
sequence of comparators. The sequence of comparators is determined before the 
computations. We assume that comparators connecting disjoint pairs of registers 
can work in parallel. Thus we arrange this sequence of comparators into a series 
of comparator layers which are sets of comparators connecting disjoint pairs 
of registers. The total time needed by a comparator network to perform its 
computations is proportional to the number of layers of the network called its 
depth. 

Much research concerning sorting networks have been done in the past. Their 
main goals were to minimize the depth and the total number of comparators. 
The most famous results are asymptotically optimal AKS JQ sorting network of 
depth 0(log N) and more ‘practical’ Batcher 0 network of depth ~ | log^ N (all 
logarithms in this paper are of base 2). Another well known result we are going 
to apply in this paper is Yao’s jSj construction of an almost optimal network 
to select t smallest (or largest) entries of a given input of size N {t-selection 
problem) . His network has depth log A^ + (1 + o(l)) log t log log N and ~ N log t 
comparators which matches lower bounds for that problem {t N). 

In this paper we deal with two problems concerning comparator networks. 
One of them is to construct a comparator network which is a unit correcting t 
passive faults (see [ 2 |) in any sorting network (some comparators are faulty and 
do nothing). Such a unit can be attached to any sorting network e.g. AKS (as a 
number if its last layers) so that the whole network is a sorting one resistant to 
t faults. The unit has to correct all the faults present in the sorting network and 
be resistant to all errors present in itself. Such a correcting unit we call t-fault- 
tolerant network. The best result concerning such networks is that of Piotrow 
who constructed asymptotically optimal network of depth 0(log N + t) having 
0{Nt) comparators. The exact constants hidden behind these big O-h’s were not 
determined, but since Piotrow uses network jS] the constant in front of log N in 
0(log + t) is at least 2. 

The other problem is to sort an almost sorted sequence. Let us consider 
for example a large sorted database with N entries. In some period of time 
we change t entries and want to have it sorted back. We design a specialized 
comparator network of a small depth to ‘repair’ the ordering and avoid using 
costly general sorting networks. Such a network to sort back a sorted sequence 
in which at most t changes were made we call t-correction network. The best 
known general result here is network of Kik, Kutylowski, Piotrow of depth 
4 log A^ + 0(log^ t log log N) . 

The networks in PI m are based on a nice construction by Schimmler and 
Starke ^ of a 1-correction network of depth 2 log N having 3.5N comparators. 
Our goal is to reduce the constant in front of log N in the depth, which is most 
essential if t is small and N big. We present t-fault-tolerant and t-correction 
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networks that for fixed t have depths ~ a log TV = log N, where 

2 



a = 



l0g2^ 



1.44.... 



This way our networks have smaller depths than any correction networks de- 
scribed by previous authors. 

For sorting networks the following useful lemma called Zero-One Principle 
holds: 



Lemma 1 ((zero— one principle)). A comparator network is a sorting net- 
work if and only if it can sort any input consisting only of O’s and 1 ’s. 

This lemma is the reason, why from now on we only consider inputs consisting 
only of O’s and I’s. Below we formulate analogous lemmas for fault tolerant and 
correction networks. 

We say 0 (1) is disturbed if it is changed to 1 (0). Resulting 1 (0) we call dis- 
placed. A sequence of O’s and I’s produced from a sorted sequence by disturbing 
t or less entries we call t- disturbed. 



Lemma 2. A comparator network is a t-fault-tolerant network if and only if 
for any x < t it can sort any x-disturbed input if we remove any set of t — x 
comparators. 



Lemma 3. A comparator network is a t-correction network if and only if it can 
sort any t-disturbed input. 

We define dirty area for 0-1 sequences contained in the registers during com- 
putations of a comparator network. Dirty area is the minimal set of subsequent 
registers such that below these registers (in registers with lower indexes) there 
are only O’s and above there are only I’s. A t-disturbed input in which only O’s 
are disturbed we call t-partially-disturbed. A comparator network that can re- 
duce dirty area size to at most A for any a;-partially-disturbed input having t — x 
faulty comparators we call (t, A)-partial-fault-tolerant. Comparator network that 
can reduce dirty area size to at most A for any t-partially-disturbed input we call 
(t, A)-partial- correction. For both networks the output is t-partially-disturbed, 
because a 1 can only increase the index of its register during computations. The 
final size of dirty area A = A{N,t) is some function of N and t. 



2 One Disturbed Position 

In this section we consider sorting of 1-disturbed inputs. For simplicity we assume 
the input is 1-partially-disturbed, i.e. has a single disturbed 0. So we have one 
displaced 1 at this position. We describe a comparator network Fjv correcting 
any 1-partially-disturbed input of size N. Due to symmetry of the network the 
case of displaced 0 follows directly from this case. 
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First we recall the definition of Fibonacci numbers fk' 

fo = fi = Ij 
fk = fk-2 + fk-l- 

We define the numbers ipk,ipk and behaving similarly to fk- 
vo = ‘Pi = i^o = ifi = 1, 

4>k = ifik-2 + tpk-1, 

(fik = the largest odd number smaller or equal ifk- 
-dk = 2f/'fc - ipk 

Let LG(n) be the smallest p such that ipp > n. Let ri,r 2 , ■ ■ ■ be registers. 
The network Fn consists of d = LG(A^) subsequent layers Li, L 2 , ■ ■ ■ ,Ld such 
that: 



Lp — {[2z + p : + p + ipd—p\\i G -Z} 

The way we define Lp requires a few words of comment. From all comparators 
in the definition of Lp only those exist whose end registers are well defined and 
belong to the set of all registers. This convention is maintained for the rest of the 
paper. Now we prove that the comparator network defined above is a 1-correction 
network indeed and estimate how many layers it has asymptotically. 

Fact 1 d = LG(A^) ~ log(i+v/ 5)/2 ^ = a log N 

Proof. The Fact follows directly from inequalities fk-i "£ Pk "L fk which can be 
easily proven. 

To see how the network works introduce first some definitions. The highest 
register containing 0 of the input we call border. The distance between displaced 1 
and the border is the difference between indexes of border and register containing 
displaced 1. Our network proves to reduce this distance very efficiently. It ends 
computations, when the distance is guaranteed to be 0. The fact the network 
FV is really a 1-correction network follows directly from the following lemma 
applied to layer d. 

Lemma 4. After applying the first I layers of Fn the distance between single 
displaced 1 and the border is smaller than ifd-i+i- If this 1 is in a register ri for 
which i = I mod 2, then this distance is smaller than ipd-i- 

Proof. We proceed by induction on 1. For I = 0 the lemma is obvious, because 
Pd > N ■ Assume that the lemma holds for ^ — 1 and prove it for 1. Let i be 
the index of the register containing displaced 1 just before we apply layer 1. If 
the displaced 1 is not moved by layer I, then two cases are possible. The first 
is i I mod 2 and from inductive hypothesis for I the distance is smaller than 
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ipd-i+i which is not bigger than In the second case i = I mod 2 the fact 

1 is not moved by layer I means that the distance is smaller than (pd-i- H this 
1 is moved then its distance is reduced by (fd-i from some value smaller than 
ipd-i+ 2 - Thus after applying layer I this distance is smaller than 



i^d-i+2 — ‘Pd-l = 4’d-l+l- 



As we see from the last lemma at any moment of computations of network 
Fn there is a set of registers below the border such that a single displaced 1 
contained one of these registers is guaranteed to get to the border at the end of 
the computations. After the first I layers this set contains all registers below 
the border (in registers with lower indexes) and in the distance from the border 
smaller than 

- ^d-i+i iii^l mod 2 

— ipd-i iii = I mod 2 

We call this set the correction area. In later considerations we set the border 
register somewhere below the highest 0. It is easy to see that in such case a 
single displaced 1 in correction area also is guaranteed to get to the border or 
to some higher register. 

Obviously last lemma implies the following corollary since at the end of 
computations of Tat the distance is zero (and due to symmetry of the network): 

Corollary 1. Comparator network Fpf is 1-correction network of depth alogN . 

3 Partial Fault Tolerant Network 

In this section we define a {t,t‘^ + t)-partial-fault-tolerant network T{s,N,t) of 
a small depth. The network is constructed for a parameter s being an arbitrary 
integer constant. Later in this paper we show how having this network we can 
easily produce a t-fault-tolerant network of a similar depth. 

The main idea of construction of T(-) is to apply a number of networks TV'. 
As we proved TV' guarantees sorting any input with a single displaced 1 . If the 
number of displaced I’s is bigger they can disturb each other to get to the border. 
At least one of them gets to the border but others do not have to. We solve this 
problem moving displaced I’s that drop out of correction area to another TV' 
network slightly delayed in comparison to the previous one. This way I’s that 
lost their chance to get to the border in one TV' regain it in another TV' . 

Now we describe the whole network in a more formal manner. Register in- 
dexes are ordered pairs (i,j) for i G N/t},j G {1, . . . , t}- It is easy to see 

that if we change all displaced I’s to O’s then in each column we have the highest 
0 almost at the same level in its column (the levels can differ by 1). Network 
T{s,N,t) consists of two parts. The first part is preprocessing consisting of the 
sequence of layers: 

Pi,P 2 , ■ ■ ■ , Pst, 
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where 

Pq = {[(g2j + ( 7 ) : {i,2j + q+ l)]\i,j G Z}. 

Fact 2 After applying the first part of T(-) to x -partially- disturbed input all 
displaced I’s are pushed to registers with biggest possible coordinate j (if the 
number of faults in is not bigger than t — x). In other words: r{i,j) contains 
displaced 1 implies r{i,j + 1) also contains a displaced 1. 

Let d = ljG{N/t). The second part of the network does the main work and 
is the sequence of layers: 

L\, L2 , . . . , Lg, Cs, Lg+i, . ■ . , L 2 s, C 2 si L2S+I7 • ■ • ) Lzsi Css, Lss+i) ■ • • 

where 

Lp = {[{2i-Ip,j) : {2i-Ip-i-(pa-p+2s{t-j),j)]\i,j & 2} 

and 



— {[(2* + 9 + l)i/) : (2i + g + '&d-q-\-2s(t-j)+i, j — 1)]|l J € Z} 

From now on = 1 for k < 0. The second part of network T{s,N,t) has 
altogether (1 + l/s)LG(iV) + 2(s + l)(t — 1) layers. As we see the layers Lp are 
layers of TV/; inside columns, and layers Cq roughly speaking move displaced 
I’s to the next delayed column when they are beyond correction areas in their 
columns. 

Now we analyze what happens to an x-partially-disturbed input in the net- 
work T'(-). We assign to each 1 during computations the property of being or not 
being active. Just after the first part we switch each displaced 1 to be active and 
each not displaced 1 not to be active. We define parameter b. At the beginning 
b is the index of the level of register containing the highest 0 in column t if we 
change all displaced I’s to O’s. An active 1 stops being active at the moment a 
comparator moves it to level i > b. At the same moment b is decreased by one. To 
each displaced 1 we assign an integer value v. First we define destination column 
index u of displaced 1 in a given moment of computations. To do it we change 
all other active I’s to O’s, repair all faulty comparators in the network and fix 
b as it is at this moment of computations. Then we continue computations of 
the network. If the 1 gets to level i > b, then destination column index u is the 
index of the column from which comparator moves it to this level. If displaced 
1 does not get to such level at all, then the index u is set to be 0. The current 
value V of an active 1 is equal to the minimum of all values u assigned to this 1 
till the considered moment of computations. When the 1 stops to be active, its 
value remains unchanged till the end of computations. It is not hard to see from 
the definition, that at the beginning of the second part each 1 has value equal 
to the index of its column. 

The following facts describe behavior of values assigned to I’s: 

Fact 3 If an active 1 is stopped by a passive fault, then its value decreases by 1 
or does not change. 
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Proof. We prove, that its destination column index does not change or decreases 
by 1. We trace that stopped 1 for 2s + 1 layers after stop repairing all the faults it 
can encounter and changing all other active I’s to O’s. Even if destination column 
index u of this 1 before it is stopped is equal to the label of its column, after it is 
stopped the index is smaller. A single displaced 1 that has such smaller u, after 
layer Lq is in register r 2 i+q+i and treated by layer Cq moves to the next column 
(having label smaller by 1). It is easy to see that during these computations this 
displaced 1 goes to the next column into a register on higher level, than at the 
moment it was stopped. The next column because of its delay, at the moment 
considered 1 gets to it, is at the same or earlier phase of its computations, as 
column in which the stop occurred was at the moment of stop. It proves that 
the index u of 1 does not decrease by more than one. 



Fact 4 Assume an active 1 is stopped by another active 1 and its value decreases. 
In such a case its value becomes to be not smaller than the value of 1 causing 
the delay decreased by 1. 

Proof. There is no difference for displaced 1 between being stopped by a passive 
fault and another displaced 1. Just before one active 1 stops another they must 
have the same destination column index. 



Lemma 5. Network T{s, N, t) reduces the dirty area of any x -partially- disturbed 
input (x <t) to at most f^ -\-t registers if it has at most t — x passive faults. 

Proof. Putting together the facts one can see that at the end of computations 
I’s that were displaced at the beginning of the second part of the network have 
values Vi,V 2 , ■ . ■ ,v^. Without loss of generality we can assume that they form 
a not increasing sequence. Because of the Facts the difference Vi — Vj+i is not 
bigger than the number of faults 1 with Ui+i encountered increased by one. Since 
vi = t, we have that Ua, > 1 and 1 having value Vx is not active at the end of the 
second part. It gives dirty area of size not bigger than + 1 since b is decreased 
X times during the computations. 



4 Partial Correction Network 

Now we define a (t, ct(log iV)°® ^°s*)-partial-correction network C{s,N,t), where 
s is an integer constant and c® depends on s. This network has depth o;(I + 
I/s) log IV + Cs(I + o(I)) log flog log W We show later in this paper how from 
this network we can obtain a t-correction network of almost the same depth. For 
this section we change denotation ipi to ')/'(*) (the same for ip and §). 

Before we begin to construct the network C{-) we prove a lemma about net- 
work Fjv on which the construction is based. As we know network Fjq success- 
fully corrects one displaced 1. The lemma describes its behavior if the number 
of displaced I’s is bigger. 
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Lemma 6. Assume that in a given moment of eomputations not less than t 
displaced I’s are in the correction area of Fpf. In such a case after the next layer 
of Fn at least tj2 displaced I’s are in the correction area. Consequently after s 
layers at least i/2'* displaced I’s remain in the correction area. 

Proof. The reason displaced 1 can drop off the correction area is that a compar- 
ison between this 1 and another displaced 1 is made. In such case the other 1 
remains in correction area. 

The main idea of construction C(-) is to have a number of disjoint networks 
Fn'. At the beginning all displaced I’s are moved to a few networks F^i (other 
become free from displaced I’s). Each s steps displaced I’s that drop out from 
correction area in one E/v' are moved to another F]\ji not containing previously 
any displaced I’s. These moved I’s are in correction area of their new network 
F]\fi, because the new F^i is delayed by s -I- 1 steps. In the delayed F^' there is 
at most fraction 1 — 1/2® of displaced I’s from the previous F^/. Thus the total 
delay cannot grow very much because in the subsequent networks F^i maximal 
numbers of displaced I’s go down exponentially. In fact this idea is similar to that 
applied in 0. The changes consist in applying F^ network and putting Cq layers 
not every second step, but less frequently. The following simple combinatorial 
fact says us, that in the our construction the number of networks F^i is small. 

Fact 5 The number of nondecreasing sequences ji, J 2 j ■ ■ ■ ,jk for 0 < k < K and 
I < ji < J is equal: 



Proof. The number is the same as the number of nondecreasing sequences of 
integers 0 < < J of length K which is the same as the number of increasing 

sequences of integers 1 < /; < J -I- AT. 

Now we define network C(-) in a more formal way. Let K = — log 2 _i/ 2 « t, J = 
[LG(A^)] -I- K. In the network C(s, N, t) indexes of registers have the form {n') o 
j o [J + t). In this denotation r G {1, . . . ,t}, j = (ji, • ■ • , Jfc) is a nondecreasing 
sequence of integers ji G {1, . . . , J} of length at most K and n' G {1, . . . , A^'}, 
where N' = ,k+j\ ■ As in the case of T{-), we can change all displaced I’s into 

( J )* 

O’s. There are at most two (differing by 1) levels of the highest O’s in a column. 
We treat the level just below these levels as the border between O’s and I’s for 
the needs of this algorithm. We exclude from the considerations all displaced I’s 
which are moved to registers above the border level. 

The network C{s,N,t) consists of two parts. The first part uses a selector 
for the t largest entries 0 to each row of registers. After the first part, displaced 
I’s in all rows below the border get to registers R{n',T -I- J). Indexes of these 
registers are lexicographically biggest in each row. This first part has depth 
~ Cs log flog log A^. The constant Cg grows as 2®. 

Let d = LG{N'). The second part consists of the sequence of layers 

Li, L 2 , ■ • • , Lg, Cg, Lg, Lg-i-ij ■ ■ ■ J Li2g, C2g, L2g+i , . . . , Lss, C^g, Lss+i, . . . 
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where 

Lp = {[{2i + p)oj o (r+ J) : {2i + p + ip{d+ (s + l)|j| - p)) o j o (r + J)]}, 

= {[(2i + < 7 +l)ojo(r+J): 

(2i + (? + 1 + r?(d+ (s + l)|j| - q + 1)) o j o - \j\,T+ j'^ | 

U {[(2i + q) o j o {t + J) : 

{2i + q + ip{d+ (s + l)|j| - q))oj o ~ \ j\,T + J)^ | . 

Altogether we have (1 + l/s)(LG(A^') + K) + (s + 1)AT layers in the second 
part. 

In this network again layers Lp represent layers of Fjqi inside the columns 
(similarly to the T{-)). Layers Cq represent transfers of disturbed I’s beyond 
correction area to columns not containing displaced I’s. From the way layers Cq 
are defined we see that only one transfer to a given column can occur during 
the whole time of computations. Displaced I’s are transferred from column j o 
(r + J) to column j' o (r + J) and \ j'\ — \j\ + 1 (since j' = j o (q/s — |j|)). All 
transferred I’s after the transfer are on a level in the distance not bigger than 
ip{d + (s + 1 ) Ij'I — g — 1) from the border level. Thus they are in correction area 
for their new column. At most fraction 1 — 1/2® of displaced I’s is transferred. 
Because of this in general we have the following fact: 

Fact 6 A column with the index jo{T + J) contains not more than t-(l — 1/2®)I^I 
displaced 1 ’s. 

The fact above is the reason, we do not need columns for \j\ > K. Even if 
they were present, no displaced I’s would get to them. Because all displaced I’s 
are at the end of computations on or above border level also the following fact 
holds: 

Fact 7 The second part of the network reduces the dirty area to at most three 
rows. 

This way the network C'(-) reduces dirty area to at most 
= ct(log registers. This proves the following lemma: 

Lemma 7. Comparator network C{s,N,t) is a {t, ct{log Ny‘^°^*)-partial-cor- 
rection network for some constant Cg depending on s. This network has depth 

a logA^ + Cs(l + o(l)) log flog log iV 

5 Fault Tolerant and Correction Networks 

Now we show how having partial-fault-tolerant and partial-correction networks 
we can obtain fault-tolerant and correction networks of almost the same depth. 
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The solutions presented in this section are intended to be as simple as possible 
and the author believes the reader can find solutions with a bit better constants. 

The problem that is often encountered in construction of comparator net- 
works is sorting inputs with dirty areas of small size. Assume we can reduce 
dirty area of t-disturbed sequence of O’s and I’s to size A. The question is how 
many layers and comparators a comparator network needs to ‘clean’ this dirty 
area. We have two versions of this question. One if we require fault-tolerance the 
other if we do not. This question is answered by easy to prove lemmas: 

Lemma 8. Assume that there exists a t- fault-tolerant network Xn that for input 
size N has depth S{N, t) and j{N, t) comparators. Then there exists a comparator 
network that sorts any x -disturbed input with a dirty area of size at most A if 
it has not more than t — x faulty comparators. This network has depth 26{2A,t) 
and ^’-f{2A,t) comparators. 



Lemma 9. Assume that there exists a t-correction network , that for input of 
size N has depth S{N, t) and 7 (iV, t) comparators. Then there exists a comparator 
network that sorts any t-disturbed input with a dirty area of size at most A. This 
network has depth 26{2A,t) and ^j{2A,f) comparators. 

Proof of both lemmas. We index the registers with integers 1, . . . , N. The network 
consists of two parts S{2A, t) layers each. The first part consists of networks X 2 A 
on each set of registers: 

S2i = {r2iA+lX2iA-i-2, ■ ■ ■ , 1"2iA-i-2A}- 



The second part is are the networks X 2 A on each set of registers: 



S2i+1 — {f (2i+l)A+lX(2i+l)A+2j ■ ■ ■ j (2i+l) A+2 a} ■ 

This network cleans the dirty area because this area is contained in at least one 

S,. 

Now having these cleaning networks we formulate the main result of this sec- 
tion. We are going to prove is that to produce a good t-fault-tolerant (-correction) 
network it is enough to construct ft, Z\)-partial-fault-tolerant(-correction) net- 
work Yat having small depth and reasonably small function A and t-fault- 
tolerant (-correction) network Aat of not too big depth and small number of 
comparators. In such case we can construct t-fault-tolerant (-correction) network 
of almost the same depth as Yat and having roughly speaking twice as many com- 
parators as Aat has. We call these reductions Refinement Lemmas. We formulate 
and prove them at once for fault-tolerant and correction networks. 



Lemma 10 ((refinement lemma)). Assume we have a comparator network 
Yn which is ft, A)-partial-fault-tolerant(-correction) network of depth S'{N,f) 
fA = A{N, t) ). We have also a t-fault-tolerant(-correction) network X^^ of depth 
6{N,t) and having j{N,t) comparators. Then for any M there exists a t-fault- 
tolerantf- correction) network for any input size N of depth A — A{Nt/M,t) 



5{M, t) -\-S' 




26 



UMA 
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with the number of comparators not bigger than: 



N 

M 



l{M,t) + ^S' 




Nt _ /AM A 
2MA + Mt^ V t 




Proof. Let indexes of registers be pairs {i,j){i € {1, , N/M},j S M}). 

Our network consists of three main parts. 

In the first part we apply Xm in each row separately. This requires 5{M,t) 
layers and ^j{M,t) comparators. The result of this part is that displaced O’s 
are moved to the first t columns, and displaced I’s are moved to last t columns 
(except maybe one row). 

In the second part we use two copies of The first copy is reversed 

upside-down to deal with displaced O’s and is applied to all registers of the first 
t columns. The second copy is applied to all registers of last t columns to deal 
with the displaced I’s. This requires 6' {^,t) layers and at most 
comparators. The result of this part is that the dirty area is reduced to at most 
2 M^ _ 1 _ registers. 

The third part is cleaning network (based on X^) for dirty area + M 
which requires 26 + 2M, t) layers and 2 MA+Mt 'i 0 compara- 

tors. 



Now we show how we can use refinement lemmas to construct fault-tolerant 
and correction networks of a small depth. In the construction of t-fault-tolerant 
network we apply the Piotrow’s network ISj. 

Theorem 8. There exists a constant c such that for an arbitrary s there exists 
a t-fault-tolerant network of depth: 

a -I- log N + clog log IV -I- (2s + c)t 

having 0(Nt) comparators. 

Proof. We defined (t, f^ + f)-partial-fault-tolerant network T(s, N, t), which has 
depth a(l -|- 1/s) log -I- (2s -I- c')t. Theorem follows from Refinement Lemma 
applied to X^ being Piotrow’s network, Y/v = T(N, s, t),PI = flog N. 

The above network is practical for small t. If we fix t and take s = \/log N, 
then we get a f-fault-tolerant network of depth alogN -|- 0(\/\og N). 

Similarly as for fault tolerant networks we can now construct a f-correction 
network applying Refinement Lemma. 

Theorem 9. For any integer s there exists a t-correction network of depth 
a ^1 -I- logiV -I- c(,(logfloglogfV)^. 
for some constant c(, depending on s. 
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Proof. We apply Refinement Lemma taking Yn = C('), XAr-Batcher network, 
M = tlogN. 

This network has depth a(l+l/s) log iV+o(log N)iit = o [ 2 ^^/ . 

We can take s = log log log and since we obtain the following 

corollary: 

Corollary 2. For any t there exists a t-eorrection network of depth 



We can also apply Refinement Lemma once again taking the network from 
previous theorem for s = 1 as Xj\f. We put = C{s,N,t),M = tlogN and 
get the following corollary. 

Corollary 3. For any integers s,t there exists a t-eorrection network of depth 



for some constant c" depending on s. 

Unfortunately it is not clear if this corollary improves the bound on t for 
which we can make a correction network of depth ^ a(l + 1/s) logfV, because 
the construction works well only for f <C iV. 

6 Minimizing Number of Comparators 

First we should know what the minimal numbers of comparators for t-fault- 
tolerant and t-correction networks are. Any t-fault tolerant network has at least 
t comparators going from any register different from the highest one to registers 
with higher indexes (to make it impossible to have them all faulty for 1-disturbed 
input). So it has at least (TV — l)t = f2{Nt) comparators. Any t-correction 
network has to be a t-selector which forces it to have f2{Nlogt) comparators 
0. These asymptotic lower bounds on the numbers of comparators in correction 
networks prove to be achieved. 

A f-fault-tolerant having asymptotically optimal number of comparators is t- 
fault-tolerant network from the previous section. It has depth o;( 1 + 1/s) log N + 
OfloglogN + st). 

An optimal t-correction network we construct using the Refinement Lemma. 
Similar techniques to those we use in this section can be applied to reduce 
numbers of comparators of practical correction networks but not to make this 
paper too long we do not describe how to do it. The simplest way to make 
these practical constructions is to use Batcher network instead of AKS in what 
follows. Unfortunately we were not able to find a t-correction network with 
asymptotically optimal number of comparators without using AKS network, so 




log log log N 



1 



log iV + c log^ t log log"^ A^ ^ a log N. 




log N + c" log t log log N + o(log log N) . 
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our further constructions are not practical. First we construct a network that is 
asymptotically optimal in the sense of the number of comparators but is not in 
the sense of depth. 

Lemma 11. There exists a t-correction network that for some constant c and 
any input size N has depth cNlogt/t and at most cNlogt comparators. 

Proof. Let AKS denote a sorting network which has depth | log t for input of 
size 2t PJ. It has at most |tlogt comparators. We index registers with integers 
1, . . . , A^. We define sets of registers: 

Si = r(i_i)4+2, . . . , r(i_i)t+2t}. 

Our network consists of 2N/t — 1 parts clogt layers each. Each part consists of 
AKS networks on register sets Si. Thus we apply AKS subsequently to 

Si, S 2 ,. . . , Spf/t, S'(Ar/t)-i, . . . ,Si. 

It is easy to see that what we constructed is really a t-correction network. 

When we have the t-correction network from the last lemma we can put it as 
Ajv to Refinement Lemma taking M = * ^ . As Yat we can use AKS which is 

a (t, 0)-partial-correction network. This way we obtain the following corollary: 

Corollary 4. There exists a t-correction network of depth OflogN) having 
0{Nlogt) comparators. 

Further on we can take correction network from the last lemma as Aat, C(-) as 
Yat and M = tlogN. As a result by Refinement Lemma we obtain the following 
corollary: 

Corollary 5. For any integer s there exists a t-correction network of depth 

a(l -I- 1/s) log tv -I- c), logtloglogtV 

for some constant depending on s which has O(Nlogt) comparators. 



7 Conclusions 

We constructed t- fault-tolerant and t-correction networks of depths ~ ologtV 
for fixed t. This is less than depth of 1-correction network found by Schimmler 
and Starke pj. Network T(-) seems to be better for practical purposes although 
it is worse than C'(-) for combinations of N and t where N is big and t > logtV. 
Some considerations we did not include in this paper seem to indicate that 
the following conjecture is true. This conjecture was originally posed by Mirek 
Kutylowski - authors only contribution is the constant a. 
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Conjecture 1. The lower bound for depth of 1-correction network is 

a log N — c. 



for some small constant c. 

Because the author was unable to find 2-correction networks of depth asymp- 
totically better than T(-), he dares to pose another conjecture concerning 2- 
correction networks. 

Conjecture 2. The lower bound for depth of 2-correction network is 

a log N + ci/log N 



for some constant c > 0. 

Acknowledgments 

Author wishes to thank Mirek Kutylowski, Krzysiek Lorys and Marek Piotrow 
for presenting the problems, helpful discussions and their encouragement to write 
this paper. Author also thanks Mirek Kutylowski for many valuable remarks that 
improved the presentation. 



References 

1. M. Ajtai, J. Komolos, E. Szemeredi, Sorting in clogn parallel steps, Combinatorica 
3 (1983), 1-19. 

2. K.E. Batcher, Sorting networks and their applications, in AFIPS Conf. Proc. 32 
(1968), 307-314. 

3. M. Kik, M. Kutylowski, M. Piotrow, Correction Networks, in Proc. of 1999 ICPP, 
40-47. 

4. M. Piotrow, Depth Optimal Sorting Networks Resistant to k Passive Faults in Proc. 
7th SIAM Symposium on Discrete Algorithms (1996), 242-251 (also accepted for 
SIAM J. Comput). 

5. M. Schimmler, C. Starke, A Correction Network for A-Sorters, SIAM J. Comput. 
18 (1989), 1179-1197. 

6. A.C. Yao, Bounds on Selection Networks, SIAM J. Comput. 9 (1980), 566-582. 

7. A.C. Yao, F.F. Yao, On Fault-Tolerant Networks for Sorting, SIAM J. Comput. 14 
(1985), 120-128. 




©g 



Least Adaptive Optimal Search 
with Unreliable Tests 



Ferdinando Cicalese^’*, Daniele Mundici^, and Ugo Vaccaro^ 

^ Dipartimento di Informatica ed Applicazioni, University of Salerno, 
84081 Baronissi (SA), Italy 
{cicalese ,uv}@dia.unisa. it, 
http: //www. dia.unisa. it /{'cicalese , ~uv} 

^ Dipartimento Scienze Informazione, University of Milan, 

Via Comelico 39-41, 20135 Milan, Italy 
mundiciOmailserver . unimi . it 



Abstract. We consider the basic problem of searching for an unknown 
m-bit number by asking the minimum possible number of yes-no ques- 
tions, when up to a finite number e of the answers may be erroneous. 
In case the {i + l)th question is adaptively asked after receiving the an- 
swer to the ith question, the problem was posed by Ulam and Renyi and 
is strictly related to Berlekamp’s theory of error correcting communica- 
tion with noiseless feedback. Conversely, in the fully non-adaptive model 
when all questions are asked before knowing any answer, the problem 
amounts to finding a shortest e-error correcting code. Let q^ijn) be the 
smallest integer q satisfying Berlekamp’s bound .0 (?) < Then 

at least q^irn) questions are necessary, in the adaptive, as well as in 
the non-adaptive model. In the fully adaptive case, optimal searching 
strategies using exactly q^ (m) questions always exist up to finitely many 
exceptional m’s. At the opposite non-adaptive case, searching strategies 
with exactly q^(m) questions — or equivalently, perfect e-error correct- 
ing codes with 2"* codewords of length qe{m ) — are rather the exception, 
already for e = 2, and do not exist for e > 2. In this paper we show 
that for any e > 1 and sufficiently large m, optimal — indeed, perfect — 
strategies do exist using a first batch of m non-adaptive questions and 
then, only depending on the answers to these m questions, a second 
batch of q^(m) — m non-adaptive questions. Since even in the fully adap- 
tive case, q^(m) — 1 questions do not suffice to find the unknown number, 
and qfim) questions generally do not suffice in the non-adaptive case, 
the results of our paper provide e-fault tolerant searching strategies with 
minimum adaptiveness and minimum number of tests. 



1 Introduction 

We consider the following scenario: Two players, called Questioner and Respon- 
der, first agree on fixing an integer m and a search space S = (0, . . . , 2™ — 1}. 
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Then the Responder thinks of a number x G S and the Questioner must find 
out X by asking questions to which the Responder can only answer yes or no. 
It is agreed that the Responder is allowed to lie (or just to be inaccurate) at 
most e times, where the integer e is fixed and known to the Questioner. We are 
interested in the problem of determining the minimum number of questions the 
Questioner has to ask in order to infallibly guess the number x. 

When the questions are asked adaptively, i.e., the ith question is asked know- 
ing the answer to the ( i— l)th question, the problem is generally referred to as the 
Ulam-Renyi game, m p. 281], m p. 47], and is strictly related to Berlekamp’s 
theory of error correcting communication with noiseless feedback At the 
other, non-adaptive extreme, when the totality of questions is asked at the out- 
set, before knowing any answer, the problem amounts to finding a shortest e-error 
correcting binary code with 2"* codewords. 

It is known that at least qe{m) questions are necessary in the adaptive and, a 
fortiori, in the non-adaptive case — where qe{m) is the smallest integer q satisfying 
Berlekamp’s bound X)i=o (?) — 2^“"*. In the fully adaptive case, an important 
result of Spencer m shows that qe{rn) questions are always sufficient, up to 
finitely many exceptional m’s. Optimal searching strategies had been previously 
exhibited by |22j> [LLLJi respectively for the case e = 1, e = 2 and e = 3. 
Thus, fully adaptive fault tolerant search can be performed in a very satisfactory 
manner. 

However, in many practical situations it is desirable to have searching strate- 
gies with “small degree” of adaptiveness, that is, searching strategies in which 
all questions (or at least, many of them) can be prepared in advance, and asked 
in parallel. This is the case, e.g., when the Questioner and the Responder are far 
away from each other and can interact only on a slow channel; or in all situations 
when formulating the queries is a costly process, and therefore the Questioner 
finds it more convenient and time-saving to prepare them in advance. We refer to 
the monographs for a discussion on the power of adaptive and non-adaptive 
searching strategies and their possible uses in different contexts. 

Unfortunately, in the totally non-adaptive case, a series of negative results 
culminating in the celebrated paper by Tietavainen m (also see C3) shows 
that searching strategies with exactly qeirn) questions — or equivalently, perfect 
binary e-errors correcting codes with 2™ codewords of length qe{rn ) — are spo- 
radic exceptions already for e = 2, and do not exist for e > 2, except in trivial 
cases. Thus, adaptiveness in Ulam-Renyi games can be completely eliminated 
only by significantly increasing the number of questions in the solution strategy^ 
Our purpose in this paper is to investigate the minimum amount of adaptiveness 
required by all successful searching strategies with exactly qeim) questions. 



^ The situation is completely different in the case of no lies: here an optimal, totally 
non-adaptive searching strategy with [log] S']] questions simply amounts to asking 
[log I S']] queries about the locations of the bit 1 in the binary expansion of the 
unknown number x G S. 
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1.1 Our Results 

We exactly quantify the minimum amount of adaptiveness needed to solve the 
Ulam-Renyi problem, while still constraining the total number of questions to 
Berlekamp’s minimum qe{m). Our main result is that for each e, and for all suffi- 
ciently large m, there exist searching strategies of shortest length (using exactly 
the minimum number qe(jn) of questions) in which questions can be submitted 
to the Responder in only two rounds. Specifically, for the Questioner to infallibly 
guess the Responder’s secret number x G S' it is sufficient to ask a first batch 
of m non-adaptive questions, and then, only depending on the m-tuple of an- 
swers, ask a second mini-batch of n non-adaptive questions. Our strategies are 
perfect, in that m + n coincides with Berlekamp’s minimum qe{m), the number 
of questions that are a priori necessary to accommodate all possible answering 
strategies of the Responder — once he is allowed to lie up to e times. Since the 
Questioner can adapt his strategy only once, our paper yields e-fault tolerant 
search strategies with minimum adaptiveness and the least possible number of 
tests. Our main tool is the discovery of a close relation between searching strate- 
gies tolerating e lies and certain special families of error correcting codes, which 
will be described in Section 3. In the last section we specialize our analysis to 
the case e = 3; we shall give an explicit description of our searching strategies 
for the Ulam-Renyi game, for all m > 99. 



1.2 Related Work 



The general issue of coping with unreliable information (and/or unreliable com- 
ponents) in computing is an important problem in computer science, and its 
study goes back to the work of von Neumann m- The problem of dealing with 
erroneous information in search strategies (what we call here Ulam-Renyi game) 
has received considerable attention in the last decades, beginning with m (see 
f2l4l5l9Hlll2l2()l22l2b| and references therein). The survey paper pi4| gives a 
detailed account of the relevant literature on the subject. In the paper the 
Ulam-Renyi game is embedded in a broader context. 

We have already mentioned the connections between Ulam-Renyi games and 
Berlekamp’s theory of error correcting communication with noiseless feedback 
0. Other interesting connections between Ulam-Renyi games and different areas 
of computer science and logic have also been found (see for instance [SI1 Sj ) . For 
the sake of conciseness, we shall limit ourselves to mentioning here only those re- 
sults which are directly related to our present issue of adaptive vs. non-adaptive 
search. It is well known that for e = 1, Hamming codes yield non-adaptive search- 
ing strategies (i.e., one round strategies) with the smallest possible number qi{m) 
of questions — indeed, Pelc 123 ! showed that adaptiveness in this case is irrele- 
vant even under the stronger assumption that repetition of the same question is 
forbidden. The first significant case where the dichotomy between adaptive and 
non-adaptive search makes its appearance is when e = 2. Two-round optimal 
strategies for the case e = 2 were given in m- Our paper extends the result 
of [I l)j to the case of an arbitrary number e of errors/lies. Other results related 
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to the issue of fully adaptive vs. totally non-adaptive searching strategies, are 
contained in H2E3. 

2 The Ulam-Renyi Game 

For some fixed integer m, let S = {0, 1, ... ,2™ — 1} be the search space. By 
a yes-no question we simply mean an arbitrary subset T of S. If the answer 
to the question T is “y^s”, numbers in T are said to satisfy the answer, while 
numbers in S\T falsify it. A negative answer to question T has the same effect 
as a positive answer to the opposite question S\T. At any stage of the game, a 
number y G S must be rejected from consideration if, and only if, it falsifies more 
than e answers. The remaining numbers of S still are possible candidates for the 
unknown x. At any time the Questioner’s state of knowledge is represented by 
an e-tuple a = (Aq, Ai, A 2 , . . . ,Ae) of pairwise disjoint subsets of S, where Ai 
is the set of numbers falsifying exactly i answers, i = 0, 1, 2, . . . , e. The initial 
state is naturally given by (S', 0, 0, . . . , 0). A state (Aq, Ai, A 2 , . . . , Ae) is final 
iff Aq U Ai U A2 U • • • U Ae either has exactly one element, or is empty. In this 
latter case, evidently, more than e lies have been told. 

For any state a = (Aq, Ai, A2 , . . . , Ae) and question T C S, the two states 
fjves ^no respectively resulting from a positive or a negative answer, are 
given by 



Af* = (A,nr)U(A,_i\T) and A^ = (A, \ ^) U (A,_i n T) (2) 



for each i = 0, 1, . . . , e. Given a state a, suppose questions Ti, . . . ,Tt have been 
asked and answers b = 61 ,..., have been received (with bi G {yes, no}). 
Iterated application of the above formulas yields a sequence of states 



By a strategy S with q questions we mean the binary tree of depth g, where 
each node v is mapped into a question , and the two edges Tyieft , ??right generated 
by V are respectively labelled yes and no. Let r] — iji, . . . ,rjq he a path in 5, from 
the root to a leaf, with respective labels bi, ... ,bq, generating nodes oi, ... ,Vq 
and associated questions , . Fix an arbitrary state a. Then, according 

to (El, iterated application of (EJ-(j2I) naturally transforms a into (where the 
dependence on the bj and Tj is understood). We say that strategy S is winning 
for (T iff for every path rj the state tr^ is final. A strategy is said to be non- 
adaptive iff all nodes at the same depth of the tree are mapped into the same 
question. 

Let a = (Ao, Ai, A2 , . . . , Ae) be a state. For each i = 0, 1, 2, . . . , e let = \Ai\ 
be the number of elements of Ai. Then the e-tuple (00,01,02 , . . . ,Oe) is called 




..,AD (1) 



where, for the sake of definiteness, we let A_i = 0, and 




(3) 



Least Adaptive Optimal Search with Unreliable Tests 553 



the type of tr. The Berlekamp weight of a before q questions^ g = 0, 1, 2, . . . , is 
given by 

= (4) 

i=0 j=0 

The character ch((r) of a state a is the smallest integer g > 0 such that Wq{cr) < 
2T 

By abuse of notation, the weight of any state a of type (oq, oi, 02 , . . . , Oe) 
before q questions will be denoted Wq{ao, ai, 02 , . . . , Ue). Similarly, its character 
will also be denoted ch(oo, Ci, 02 , . • . , Ce). 

As an immediate consequence of the above definition we have the follow- 
ing monotonicity properties: For any two states tr' = {A'^, A'^, A' 2 , ■ ■ ■ , A'^) 
and <j" = {Aq, A'{, A'f, . . . ,A") respectively of type (oq, a'^, a^, . . - , a^) and 
(og , a", a'f , . . . , a"), if a' < a" for all i = 0, 1, 2, . . . , e then 

ch(tj') < ch((r") and Wq{a') < Wq{a") (5) 

for each q > 0. Moreover, if there exists a winning strategy for a" with q questions 
then there exists also a winning strategy for a' with q questions 0. Note that 
ch((r) = 0 iff (7 is a final state. 

Lemma 1. 0 Let a be an arbitrary state, and T C S a question. Let and 
cr"° be as in 

(i) (Conservation Law). For any integer q> 1 we have Wq{a) = Wq-i{a^^‘^) + 
w,_i(cr"°). 

(ii) (Berlekamp’s lower bound). Lf a has a winning strategy with q questions then 
q > ch((r). □ 

In complete analogy with the notion of perfect error correcting code mi- we say 
that a winning strategy for a with q questions is perfect iS q = ch((r). In agree- 
ment with the above notation, we shall write qe{rn) instead of ch(2™, 0, . . . , 0). 

Let cr = (Ag, Ai, A 2 , . . . , Ag) be a state. Let T C S' be a question. We say 
that T is balanced for a iff for each j = 0, 1, 2, . . . , e, we have | nT| = \ T|. 

The following is easy to prove. 

Lemma 2. Let T be a balanced question for a state a = (Ag, A\, A 2 , ■ ■ ■ , Ag). 
Let n = ch((r). Let cr*'®® and cr"° be as in above. Then 

(i) Wq{a^‘^^) = Wq{a'^°), for each integer q>0, 

(ii) ch(cr*'®®) = ch(cr"°) = n-l. 

3 Strategies vs. Codes 

Let us first remind some notations from Coding Theory, for more see 
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Fix an integer n > 0 and let x,y & {0, 1}". The Hamming distance dn{x, y) 
is defined by 

dH{x,y) = |{t G ,n} I Xi 2 /*}|, 

where, as above, |A| denotes the number of elements of A, and Xj (resp. yi) 
denotes the tth components of x (resp. y). 

The Hamming sphere Br{x) with radius r and center x is the set of elements 
of {0, 1}" whose Hamming distance from x is at most r, in symbols, 

Br{x) = {y G {0, 1}" I dH{x, y) < r}. 

Notice that for any x G {0,1}”, and r > 0, we have \Br{x)\ = X)i=o (i)- The 
Hamming weight wh{x) of x is the number of non-zero digits of x. Throughout 
this paper, by a code we shall mean a binary code, in the following sense: 

Definition 1. A (binary) code C of length n is a non-empty subset o/|0, 1}”. 
Its elements are called codewords. The minimum distance of C is given by 

S{C) = mm{dH{x, y) \x,y &C,x^ y}. 

We say that C is an (n, m, d) code iffC has length n, |C| = m and S(C) = d. The 
minimum weight ofC is the minimum of the Hamming weights of its codewords, 
in symbols, 

li{C) = min{w//(a:) | x G C}. 

Let Cl and C 2 be two codes of length n. The minimum distance between Ci and 
C 2 is defined by 



A{Ci,C 2 ) = mm{dH{x,y) \x eCi,y £ C 2 }. 

We now describe a correspondence between non-adaptive winning strategies 
and certain special codes. This will be a key tool to prove the main results of 
our paper. 

Lemma 3. Let a = {Aq, Ai, A 2 , ■ . . Ag) be a state of type (oq, oi, 02 , . . . , Oe). 
Let n > ch(cr). Then a non-adaptive winning strategy for u with n questions 
exists if and only if for all z = 0, 1, 2, . . . , e— 1 there are integers di > 2{e-i)-\-l, 
together with an e-tuple of codes T = {Cq,Ci,C 2 , ■ ■ ■ ,Ce-i}, such that each Ci is 
an (n,ai,di) code, and A{Ci,Cj) > 2e— (z-l-j)-l-l, {whenever 0 < i < j < e— 1). 

Proof. We first prove the implication strategy codes. 

Assume a = (Aq, Ai, A 2 , . . . , Ag) to be a state of type (oq, oi, 02 , . . . , Oe) 
having a non-adaptive winning strategy S with n questions T\, . . . ,Tn, n > 
ch((r). Let the map 

z G Aq U Ai U A 2 U . . . U Ag I — y G {0, 1}” 

send each 2 G Aq U Ai U A 2 U . . . U Ag into the n-tuple of bits = zf ■ ■ ■ 
arising from the sequence of “true” answers to the questions “does z belong to 
Ti ?”, “does z belong to T 2 ?”, . . ., “does z belong to T„ ?”, via the identifications 
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1 = yes, 0 = no. More precisely, for each j = 1, ... ,n, zf = 1 iS z & Tj. Let 
C C {0,1}” be the range of the map z z^. We shall first prove that, for 
every i = 0, . . . , e — 1, there exists an integer di > 2(e — i) + 1 such that the set 
Ci = (y S C I y G Ai} is an (n, Oi, di) code. 

Since S is winning, the map 2 ; i— ^ is one-to-one, whence in particular 

\Ci\ = Qi, for any i = 0, 1, 2, . . . , e — 1. Moreover by definition, the Ci’s are 
subsets of (0, 1}”. 

Claim 1. 5{Ci) > 2(e — i) -I- 1, for i = 0, . . . , e — 1. 

For otherwise (absurdum hypothesis) assuming c and d to be two distinct 
elements of Ai such that dnic^ , d^) < 2(e — i), we will prove that S is not a 
winning strategy. We can safely assume = df for each j = 1, . . . , n — 2(e — i). 
Suppose the answer to question Tj is “yes” or “no” according as cf = 1 
or cf = 0, respectively. Then after n — 2{e — i) answers, the resulting state 
has the form a' = (Aq, . . . , A', . . . , A}), with {c,d} C A', whence the type 
of <j' is (ttg, . . . , a', . . . , a'g) with a' > 2. Since by ^ Lemma 2.5], ch(cr') > 
ch(0, 0, . . . , 0, 2, 0, . . . , 0) = 2(e — i) -I- 1 then from Lemma[n]ii) it follows that the 
remaining 2(e — z) questions/answers do not suffice to reach a final state, thus 
contradicting the assumption that S is winning. 

Claim 2. For any 0<z<j<e— 1 and for each y G Ai and h G Aj we have the 
inequality dniy^ ,h^) > 2e — (i -I- j) + 1. 

For otherwise (absurdum hypothesis) let y G ft, G Aj be a counterexample, 
and dH{y^ ,h^) <2e— {i + j). Writing y^ = yf . . .yf and = hf . . . hf, it 
is no loss of generality to assume hf = yf, for all ft = 1, . . . , n — (2e — {i + j)). 
Suppose that the answer to question Tk is “yes” or “no” according as ft^ = 1 or 
hf — 0, respectively. Then the state resulting from these answers has the form 
a" = (Aq, A”, A' 2 ,. . . , A"), where y G A” and h G A' f Since by pj Lemma 2.5], 
ch(cr") > ch(0, . . . , 0, 1, 0, . . . , 0, 1, 0, . . . , 0) = 2e — (z -I- j) -I- 1, then LemmaOtii) 
again shows that 2e — {i + j) additional questions will not suffice to find the 
unknown number. This contradicts the assumption that 5 is a winning strategy. 

In conclusion, for all z = 0, 1, . . . , e — 1, Ci is an (n, ai, di) code with di > 
2(e — z) -I- 1 and for all } = 0, . . . , z — 1, z -I- 1, . . . , e — 1, we have the desired 
inequality A{Ci,Cj) >2e — {i + j) + 1. 

Now we prove the converse implication: strategy <= codes. 

Let r = {Co,Ci,C 2 , . . . ,Ce-ij be a family of codes satisfying the hypothesis. 
Let 

e— 1 

^ = U U ^e-^{x). 

2—0 X^Ci 

By hypothesis, for any i,j G {0,l,...,e— 1} and x G Ci,y G Cj we have 
dnix, y) > 2e — (z-l-j)-l-l. It follows that the Hamming spheres Be-i{x), Be-j{y) 
are pairwise disjoint and hence 

e— 1 e—i / \ 



( 6 ) 
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Let T> = {0, 1}" \ T~L. Since n > ch(oo, «i, 02 , • ■ • , Oe), by definition of character 
we have 2" > YTi=o • From (0 it follows that 

e— 1 e—i / \ 

>«,. ( 7 ) 

1=0 i=o ^ 

Let cr = (Aq, Ai, ^2 , . . . , Ae) be an arbitrary state of type (oq, ai, 02 , . . . , Ce). Let 
us now fix, once and for all, e+ 1 one-one maps ff. Ai — >■ Ci, for j = 0, 1 , . . . , e— 1 
and fe- Ae — >■ T>. The existence of the map fi, for alH = 0, 1 , . . . , e, is ensured 
by our assumptions about T, together with (Q. 

Let the map /: ^0 U U ^2 U • • • U — >■ {0, 1}" be defined by cases as 

follows: 



{ foiy), y^Ao 

fi{y),y^Ai 

■ ( 8 ) 

feiy), y&A^ 

Note that / is one-one. For each y G Aq U U A 2 U • • • U and j = 1, . . . , n let 
f{y)j be the jth bit of the binary vector corresponding to y via /. We can now 
exhibit the questions Tj of our searching strategies: 

For each j = 1 , . . . , n let the set Tj C S' be defined by T,- = {z G Ui=o I 
f{z)j = 1}. Intuitively, letting x* denote the unknown number, Tj asks “is the 
jth bit of /(x,) equal to one ?” 

Again writing yes = 1 and no = 0, the answers to questions Ti, . . . ,Tn determine 
an n-tuple of bits b = 61 We shall show that the sequence Ti,...,T„ 

yields an optimal non-adaptive winning strategy for a. Let ai = tr^L (T2 = 
, . . . , cr„ = Cry'Ll- Arguing by cases we shall show that = (Aq, A*, . . . , A*) 
is a final state. 

By Q-@, for alH = 0, 1, . . . , e, any 0 G Ae-i that falsifies > i answers does 
not survive in (t„ — in the sense that 2 : ^ Aq U AJ U • • • U A* . 

Case A b^Ul=oUeA.Se-^(/(j/)). 

For alH = 0, 1, . . . , e, and for each y G Aiwe must have y ^ AqUA^U- • -UA*. 
Indeed, the assumption b ^ Be-i{f{y)) implies dH{f{y),b) > e — i, whence y 
falsifies > e — i of the answers to Ti, . . . , T„, and y does not survive in (t„. We 
have proved that Aq U AJ U • • • U A* is empty, and (t„ is a final state. 

Case 2. b G Be-i{f{y)) for some i G {0, 1, . . . , e} and y G A^. 

Then y G Aq U A^ U • • • U A*, because dH{f{y),b) < e — i, whence y falsifies 
< e — i answers. Our assumptions about T ensure that, for all j = 0, 1, . . . , e and 
for all y' G Aj and y ^ y' , we have b ^ Be-jif{y'))- Thus, dH{f{y'),b) > e- j 
and y' falsifies > e — j of the answers to Ti, . . . ,Tn, whence y' does not survive 
in (T„. This shows that for any y' ^ y, y' ^ Aq U AJ U • • • U A*. Therefore, 
Ag U AJ U • • • U AJ only contains the element y, and (T„ is a final state. 
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4 Optimal Strategies with Minimum Adaptiveness 

4.1 The First Batch of Questions 

Recall that qe(jn) = ch(2"*, 0, . . . , 0) is the smallest integer q > Q such that 

2? > 2"*((®) + ^ ^ (1) +9+1)- By Lemma^ii), at least qe(m) questions 

are necessary to guess the unknown number a;* G S = {0, 1) ■ • • j 2™ — 1}, if up 
to e answers may be erroneous. The aim of the rest of this paper is to prove 
that, conversely, for sufficiently large m, qe{rn) questions are sufficient under 
the following constraint: first we use a predetermined non-adaptive batch of m 
questions Di, . . . , Dm, and then, only depending on the answers, we ask the 
remaining qe{rn) — m questions in a second non-adaptive batch. The first batch 
of questions is easily described as follows: 

For each i = 1, 2, . . . , m, let Di C S denote the question “Is the ith 
binary digit of x* equal to 1?” Thus a number y € S belongs to Di iff 
the ith bit of its binary expansion y = yi ■ ■ ■ y-m is equal to 1. 

Upon identifying 1 = yes and 0 = no, let bi G {0,1} be the answer to question 
Di. Let b = bi ■ ■ ■ bm- Repeated applications of 0-® beginning with the initial 
state cr = (S', 0, . . . , 0), shows that the resulting state as an effect of the answers 
bi - ■ ■ bm, is an (e -I- l)-tuple — (Aq, Ai, . . . , A^,), where 

= {y & S \ dniy, b) = i} for alH = 0, 1, . . . , e. 

Direct verification yields 

|4o| = l, \A,\ = m,...,\A,\=(^^y 

Thus has type (l,m, (™), . . . , (™)). As in 0, let CTi be the state resulting after 
the first i answers, beginning with tro = o". Since each question Di is balanced 
for an easy induction using Lemma |3 yields ch(cr^) = qe{m) — m. 

For each m-tuple b G (0, 1}™ of possible answers, we shall construct a non- 
adaptive strategy Sb with ch(l,m, (™), . . . , (™)) questions, which turns out to 
be winning for the state a^. To this purpose, let us consider the values of 
ch(l,m, for rn>l. 

Definition 2. Let e > 0 and n > 2e be arbitrary integers. The critical index 
mn,e is the largest integer m>0 such that ch(l,m, (™), . . . , (™)) = n. 

Lemma 4. Let e > 0 and n>2e be arbitrary integers. Then mn,e < 2« +e. 

Proof. Recall that m„_e = max {to | (l, to, (™), . . . , < 2"} . We now set 

TO* = \fe\2^ -be. Then, the desired result now directly follows from the inequal- 
ity vjn ^1, w*, )j • ■ • ! > 2". Indeed, we have 
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Wn 







e + 1) 



4.2 The Second Batch of Questions 

We now prove that for all sufficiently large m there exists a second batch of n = 
qe(jn) = ch(l,TO, (™), . . . , (™)) non-adaptive questions allowing the Questioner 
to infallibly guess the Responder’s secret number. We first need the following 
lemma. 

Lemma 5. For any fixed e and all sufficiently large n there exists a family 
of codes r = {CqjCi, . . . ,Ce_i} together with integers di > 2(e — i) + 1 
(i = 0, 1, . . . , e — 1) such that 

(i) Each Ci is an {n, code; 

(a) A{Ci,Cj) > 2e — {i + j) + 1, {whenever 0 < f < j < e — 1.) 

Proof. Let n' = n — . First we prove the existence of an (n', code, 

with d' = 2e + 1. From Lemma 0 together with the well known inequality e! < 
, it follows that, for all sufficiently large n 

< (\/e!2« + 

< (e2t)®-i ^ 

2«-e^ 

— ^n— 

Z^j-0 V i / 

The existence of the desired {n' d') code now follows by the well known 

Gilbert Bound HT]. 

We have proved that, for all sufficiently large n, there exists an {n — e^, , d') 

code C with d' > 2e + 1. For any j = 0, 1, . . . , e — 1 let the e^-tuple be defined 
by 

ai = oo_^ . 

Furthermore, let C” be the code obtained by appending the suffix ai to the 
codewords of C' , in symbols. 



C'l =C' ® ai. 
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Trivially, C" is an {n, + 1) code for alH = 0, 1, . . . , e. Furthermore, we 

have Z\(C",C") = 2e > 2e — (i + j) + 1, whenever 0<i<j<e — 1. For each 
i = 0, 1, . . . , e — 1, pick a subcode Ci C C" with \Ci\ = Then the new 

family of codes F — {Cq,Ci, . . . ,Ce~i} satisfies both conditions (i) and (ii) and 
the proof is complete. 

The following corollary implies the existence of minimum adaptiveness perfect 
searching strategies. 

Corollary 1. Fix an integer e > 0. Then for all suffieiently large integers m and 
for every state a of type (l,m, (™), . . . , (™)) there exists a non-adaptive winning 
strategy S sueh that the number of questions in S eoineides with Berlekamp’s 
lower bound ch(cr) = q 2 {m) — m. 

Proof. Let n = ch((r). By definition, n — >■ oo as m — >■ oo. Lemmas O and El 
yield a non-adaptive winning strategy with n questions for any state of type 
(1, m„_e, ■ ■ ■ ’ Definition 0 rn < rUn, and a fortiori, for all 

sufficiently large m, a non-adaptive winning strategy with n questions exists for 
any state of type ( 1 , m, (™) , . . . , (™)). 

5 Ulam-Renyi Game with Three Lies and Minimum 
Adaptiveness 

In this section we restrict to the particular case e = 3. We shall prove that for 
all TO > 99 perfect (hence, a fortiori, optimal) searching strategies do exist to 
find an unknown m-bit number x, with minimum adaptiveness and up to 3 lies 
in the answers. States have now the form (Aq, Ai, A_ 2 , A 3 ). Proceeding as in the 
previous section, we may safely assume that after a first batch of to non-adaptive 
questions asking for the binary expansion of x* (the bitwise batch), the resulting 
state a is of type (1 ,to, (™), (™)) and character n = ch(cr) = qz{m) — to. We 
shall show that a non-adaptive winning strategy for a with n questions exists 
for each to > 99 (we will not try to optimize this constant in the present version 
of this paper). We shall use the following preliminary lemma. 

Lemma 6. Let n and m be arbitrary integers > 1. For i = 1,2, let Ci be an 
{n,Mi,di) code with p.{Ci) > gi, for some integers Mi > d?; > 

7 — 2i, gi >7 — i. Let Zi(Ci,C 2 ) > 4. 

Then for all j = 1,2,3,..., there exists an {n + Sj, M' ,5) code with 

M' > 2^to, p,(T>[^^) > gi, together with an (n + 3j,M'',3) code T >2 ^ such 
that M" > (2 2 ™), > g 2 and > 4. 

Proof. Omitted. 



Lemma 7. For all n > 19 there is an (n,Mi,di) code Cn,i 
(n, M 2 ,d 2 ) code Cn ,2 such that 



and an 
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Ml > m„,3, di >5, M 2 > J > 3, 
KCn,l) > 6, Ai(Cn.2) > 5, A{Cn,l,Cn,2) > 4. 



Proof. By direct inspection in P, Table I-A, I-B], for n = 20, 21, 22, there exist 
codes T’n,i) 2 ^n, 2 ) 2 ^n ,3 such that 

(i) Vn^i is an (n,M„^i,6) code and wh{x) = 6 for any x € Dn,i, 

(ii) T>n ,2 is an (n, M„_2,4) code and wh{x) = 10 for any x G Dn, 2 , 

(iii) Vn^z is an (n, M„^3,4) code and wh{x) = 13 for any x G 0^,3- 
Moreover, 



Mn,i > y/Q 2 3 + 3 > rriji z 



and 



M„ 2 + M„ 3 > 



f^62: 



> 



1 1 Or, 



> 



It is apparent that A{'Dn, 2 ^T^n, 3 ) > 3. Define Cn,i = T)n,i and 0^,2 = T^n, 2 ^T^n, 3 - 
Trivially, /i(C„^) > 6 and fj,(Cn, 2 ) > 5. Hence the claim holds for n = 20,21,22. 

For any n > 23, write n = n' + 3j with n' G {20, 21, 22} and j > 1. Then by 
LemmaO there exist an (n,M',5) code Cn,i with 



M' > 2^m„.,3 



> TOn'+3j,3 = TO™. 3 



and an (n, M",3) code Cn ,2 with 



M" > 





such that y.{Cn,i) > 6, fi{Cn, 2 ) > 5 and Z\(C„,i,C„_ 2 ) > 4. Hence the desired 
result holds for all n > 20. 

For the remaining case n = 19, direct inspection in P Table I-A, I-B] again 
yields three codes {i = 1,2,3) as above, with Mn,i = 172 > 127 = mig _3 

and Mn ,2 + M „_3 = 8322 > 8001 = ("* 2 '’’^)- concludes the proof. 



Corollary 2. Fix an integer m > 99, and let a be an arbitrary state of type 
(l,m, (™), (™))- Then there exists a perfect non-adaptive winning strategy S for 
a (in the sense that the number of questions in S coincides with Berlekamp’s 
lower bound ch(cr) = qzfm) — m). 

Proof. Let n = ch(CT). From the assumption m > 99 by direct inspection, we get 
n > 19. Lemma [71 yields an (n, oi,di) code T>i with oi > m„_ 3 , fJ.{'Di) > 
6, di > 5 together with an ( 71 , 02 ,^ 2 ) code T >2 with 02 > (™ 2 ’^)’ ^(^ 2 ) > 
5, d 2 > 3 satisfying the inequality A(T>i,T> 2 ) > 4. By definition, m < 
mn, 3 . Pick subcodes Ci C T>i and C 2 C T >2 such that jCij = m and 
IC 2 I = (™). Finally let the (n, 1,7) code Cq be defined by Cq = {0...0}. Then 
the desired conclusion directly follows by Lemma 0 using the family of codes 

r = {Co,Ci,C 2 |. 
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6 Conclusions and Open Problems 



For all sufficiently large search spaces we have proved the existence of perfect e- 
error correcting search strategies where adaptiveness occurs only once. Our result 
is optimal in that, by Tietavainen theorem, [28| . for all e > 1 adaptiveness cannot 
be further reduced without loosing the property of perfectness. Our results also 
suggest several interesting problems for future investigation. 

The first problem is motivated by the asymmetric nature of the communica- 
tion between Questioner and Responder. Indeed, in our scenario the Questioner- 
to-Responder channel is noiseless, while the channel in the opposite direction is 
noisy. In the cooperative model where Questioner and Responder have agreed on 
the searching strategy, and lies are replaced by distortions, our results show that 
error correction can be achieved by, first sending m bits via the noisy Responder- 
to-Questioner channel, then sending via the noiseless channel the m-tuple of bits 
actually received by the Questioner, and finally, sending to the Questioner a final 
tip of qe(jn) — m bits, again via the noisy channel. It seems reasonable to try to 
limit the use of the noiseless channel, which in practice is the more costly chan- 
nel. The following problem is especially worthy of investigation: To which extent 
can one decrease the number of bits sent through the noiseless channel, while 
still keeping to a minimum both the total number of questions and the number of 
non-adaptive batches of questions? Trade-off results between the above parame- 
ters are also of interest. For general recent results on asymmetric communication 
channels see [Q. 

Following tradition, we have allowed questions to be arbitrary subsets of the 
search space. However, interesting research problems arise once one restricts the 
Questioner’s expressive power. For instance, can our perfect, minimum adap- 
tiveness, strategies be achieved by only using comparison questions and their 
like (as in ItJ 511912 Oi l ? On the other hand, which sorts of perfect minimally 
adaptive strategies exist in the model where the Responder is allowed a greater 
expressive power than mere binary answers (as, e.g., in m) ? It would also be 
of interest to extend to e > 3 the non-asymptotic results of Section El 

Finally, it would be interesting to extend our methods to other related prob- 
lems in the area of computing with unreliable tests (e.g., PEI). 
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